Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote: > Raise the maximum possible virtio transfer size to 128M > (more precisely: 32k * PAGE_SIZE). See previous commit for a > more detailed explanation for the reasons of this change. > > For not breaking any virtio user, all virtio users transition > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of > VIRTQUEUE_MAX_SIZE, so they are all still using the old value > of 1k with this commit. > > On the long-term, each virtio user should subsequently either > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE > after checking that they support the new value of 32k, or > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE > macro by an appropriate value supported by them. > > Signed-off-by: Christian Schoenebeck I don't think we need this. Legacy isn't descriptive either. Just leave VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k. > --- > hw/9pfs/virtio-9p-device.c | 2 +- > hw/block/vhost-user-blk.c | 6 +++--- > hw/block/virtio-blk.c | 6 +++--- > hw/char/virtio-serial-bus.c| 2 +- > hw/input/virtio-input.c| 2 +- > hw/net/virtio-net.c| 12 ++-- > hw/scsi/virtio-scsi.c | 2 +- > hw/virtio/vhost-user-fs.c | 6 +++--- > hw/virtio/vhost-user-i2c.c | 2 +- > hw/virtio/vhost-vsock-common.c | 2 +- > hw/virtio/virtio-balloon.c | 2 +- > hw/virtio/virtio-crypto.c | 2 +- > hw/virtio/virtio-iommu.c | 2 +- > hw/virtio/virtio-mem.c | 2 +- > hw/virtio/virtio-mmio.c| 4 ++-- > hw/virtio/virtio-pmem.c| 2 +- > hw/virtio/virtio-rng.c | 3 ++- > include/hw/virtio/virtio.h | 20 +++- > 18 files changed, 49 insertions(+), 30 deletions(-) > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > index cd5d95dd51..9013e7df6e 100644 > --- a/hw/9pfs/virtio-9p-device.c > +++ b/hw/9pfs/virtio-9p-device.c > @@ -217,7 +217,7 @@ static void virtio_9p_device_realize(DeviceState *dev, > Error **errp) > > v->config_size = sizeof(struct virtio_9p_config) + strlen(s->fsconf.tag); > virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > -VIRTQUEUE_MAX_SIZE); > +VIRTQUEUE_LEGACY_MAX_SIZE); > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > } > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index 336f56705c..e5e45262ab 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -480,9 +480,9 @@ static void vhost_user_blk_device_realize(DeviceState > *dev, Error **errp) > error_setg(errp, "queue size must be non-zero"); > return; > } > -if (s->queue_size > VIRTQUEUE_MAX_SIZE) { > +if (s->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > error_setg(errp, "queue size must not exceed %d", > - VIRTQUEUE_MAX_SIZE); > + VIRTQUEUE_LEGACY_MAX_SIZE); > return; > } > > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > *dev, Error **errp) > } > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > -sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > +sizeof(struct virtio_blk_config), VIRTQUEUE_LEGACY_MAX_SIZE); > > s->virtqs = g_new(VirtQueue *, s->num_queues); > for (i = 0; i < s->num_queues; i++) { > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index 9c0f46815c..5883e3e7db 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -1171,10 +1171,10 @@ static void virtio_blk_device_realize(DeviceState > *dev, Error **errp) > return; > } > if (!is_power_of_2(conf->queue_size) || > -conf->queue_size > VIRTQUEUE_MAX_SIZE) { > +conf->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > error_setg(errp, "invalid queue-size property (%" PRIu16 "), " > "must be a power of 2 (max %d)", > - conf->queue_size, VIRTQUEUE_MAX_SIZE); > + conf->queue_size, VIRTQUEUE_LEGACY_MAX_SIZE); > return; > } > > @@ -1214,7 +1214,7 @@ static void virtio_blk_device_realize(DeviceState *dev, > Error **errp) > virtio_blk_set_config_size(s, s->host_features); > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size, > -VIRTQUEUE_MAX_SIZE); > +VIRTQUEUE_LEGACY_MAX_SIZE); > > s->blk = conf->conf.blk; > s->rq = NULL; > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c > index 9ad915..2d4285ab53 100644 > --- a/hw/char/virtio-serial-bus.c > +++ b/hw/char/virtio-serial-bus.c > @@ -1045,7 +1045,7 @@ static void virtio_serial_device_realize(DeviceState > *dev, Error **errp) > config_size = offsetof(struct virtio_console_config, emerg_wr); > } > virtio_init(vdev, "virtio-serial"
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Mon, 4 Oct 2021 09:11:04 -0400 "Michael S. Tsirkin" wrote: > > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) > > >> { > > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) > > >> return virtio_is_big_endian(vdev); > > >> #elif defined(TARGET_WORDS_BIGENDIAN) > > >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { > > >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ > > >> return false; > > >> } > > >> return true; > > >> #else > > >> return false; > > >> #endif > > >> } > > >> > > > > > > ok so that's a QEMU bug. Any virtio 1.0 and up > > > compatible device must use LE. > > > It can also present a legacy config space where the > > > endian depends on the guest. > > > > So, how is the virtio core supposed to determine this? A > > transport-specific callback? > > I'd say a field in VirtIODevice is easiest. Wouldn't a call from transport code into virtio core be more handy? What I have in mind is stuff like vhost-user and vdpa. My understanding is, that for vhost setups where the config is outside qemu, we probably need a new command that tells the vhost backend what endiannes to use for config. I don't think we can use VHOST_USER_SET_VRING_ENDIAN because that one is on a virtqueue basis according to the doc. So for vhost-user and similar we would fire that command and probably also set the filed, while for devices for which control plane is handled by QEMU we would just set the field. Does that sound about right?
Re: [PATCH v2 1/2] hw/adc: Add basic Aspeed ADC model
On 10/5/21 07:31, Peter Delevoryas wrote: On Oct 4, 2021, at 12:49 AM, Cédric Le Goater wrote: On 10/3/21 21:18, p...@fb.com wrote: From: Andrew Jeffery This model implements enough behaviour to do basic functionality tests such as device initialisation and read out of dummy sample values. The sample value generation strategy is similar to the STM ADC already in the tree. Signed-off-by: Andrew Jeffery [clg : support for multiple engines (AST2600) ] Signed-off-by: Cédric Le Goater [pdel : refactored engine register struct fields to regs[] array field] [pdel : added guest-error checking for upper-8 channel regs in AST2600] Signed-off-by: Peter Delevoryas Reviewed-by: Cédric Le Goater Thanks, C. Hey Cedric, Actually, I have just submitted a v3 of this patch series to support 16-bit reads of the channel data registers. I don’t think I tested using the driver to read from the ADC, and that’s what Patrick found crashed with these changes. Since it’s relatively easy to enable 16-bit reads, I figured I would just include that. OK. A Tested-by: tag would be welcome ! Thanks, C.
Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Tue, 5 Oct 2021 03:16:07 -0400 "Michael S. Tsirkin" wrote: > On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote: > > Raise the maximum possible virtio transfer size to 128M > > (more precisely: 32k * PAGE_SIZE). See previous commit for a > > more detailed explanation for the reasons of this change. > > > > For not breaking any virtio user, all virtio users transition > > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of > > VIRTQUEUE_MAX_SIZE, so they are all still using the old value > > of 1k with this commit. > > > > On the long-term, each virtio user should subsequently either > > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE > > after checking that they support the new value of 32k, or > > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE > > macro by an appropriate value supported by them. > > > > Signed-off-by: Christian Schoenebeck > > > I don't think we need this. Legacy isn't descriptive either. Just leave > VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k. > Yes I agree. Only virtio-9p is going to benefit from the new size in the short/medium term, so it looks a bit excessive to patch all devices. Also in the end, you end up reverting the name change in the last patch for virtio-9p... which is a indication that this patch does too much. Introduce the new macro in virtio-9p and use it only there. > > --- > > hw/9pfs/virtio-9p-device.c | 2 +- > > hw/block/vhost-user-blk.c | 6 +++--- > > hw/block/virtio-blk.c | 6 +++--- > > hw/char/virtio-serial-bus.c| 2 +- > > hw/input/virtio-input.c| 2 +- > > hw/net/virtio-net.c| 12 ++-- > > hw/scsi/virtio-scsi.c | 2 +- > > hw/virtio/vhost-user-fs.c | 6 +++--- > > hw/virtio/vhost-user-i2c.c | 2 +- > > hw/virtio/vhost-vsock-common.c | 2 +- > > hw/virtio/virtio-balloon.c | 2 +- > > hw/virtio/virtio-crypto.c | 2 +- > > hw/virtio/virtio-iommu.c | 2 +- > > hw/virtio/virtio-mem.c | 2 +- > > hw/virtio/virtio-mmio.c| 4 ++-- > > hw/virtio/virtio-pmem.c| 2 +- > > hw/virtio/virtio-rng.c | 3 ++- > > include/hw/virtio/virtio.h | 20 +++- > > 18 files changed, 49 insertions(+), 30 deletions(-) > > > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > > index cd5d95dd51..9013e7df6e 100644 > > --- a/hw/9pfs/virtio-9p-device.c > > +++ b/hw/9pfs/virtio-9p-device.c > > @@ -217,7 +217,7 @@ static void virtio_9p_device_realize(DeviceState *dev, > > Error **errp) > > > > v->config_size = sizeof(struct virtio_9p_config) + > > strlen(s->fsconf.tag); > > virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > > -VIRTQUEUE_MAX_SIZE); > > +VIRTQUEUE_LEGACY_MAX_SIZE); > > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > > } > > > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > index 336f56705c..e5e45262ab 100644 > > --- a/hw/block/vhost-user-blk.c > > +++ b/hw/block/vhost-user-blk.c > > @@ -480,9 +480,9 @@ static void vhost_user_blk_device_realize(DeviceState > > *dev, Error **errp) > > error_setg(errp, "queue size must be non-zero"); > > return; > > } > > -if (s->queue_size > VIRTQUEUE_MAX_SIZE) { > > +if (s->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > > error_setg(errp, "queue size must not exceed %d", > > - VIRTQUEUE_MAX_SIZE); > > + VIRTQUEUE_LEGACY_MAX_SIZE); > > return; > > } > > > > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > > *dev, Error **errp) > > } > > > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > > -sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > > +sizeof(struct virtio_blk_config), > > VIRTQUEUE_LEGACY_MAX_SIZE); > > > > s->virtqs = g_new(VirtQueue *, s->num_queues); > > for (i = 0; i < s->num_queues; i++) { > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > index 9c0f46815c..5883e3e7db 100644 > > --- a/hw/block/virtio-blk.c > > +++ b/hw/block/virtio-blk.c > > @@ -1171,10 +1171,10 @@ static void virtio_blk_device_realize(DeviceState > > *dev, Error **errp) > > return; > > } > > if (!is_power_of_2(conf->queue_size) || > > -conf->queue_size > VIRTQUEUE_MAX_SIZE) { > > +conf->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > > error_setg(errp, "invalid queue-size property (%" PRIu16 "), " > > "must be a power of 2 (max %d)", > > - conf->queue_size, VIRTQUEUE_MAX_SIZE); > > + conf->queue_size, VIRTQUEUE_LEGACY_MAX_SIZE); > > return; > > } > > > > @@ -1214,7 +1214,7 @@ static void virtio_blk_device_realize(DeviceState > > *dev, Error **errp) > > virtio_blk_set_config_size(s, s->host_feature
Re: [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable
On Mon, 4 Oct 2021 21:38:04 +0200 Christian Schoenebeck wrote: > Refactor VIRTQUEUE_MAX_SIZE to effectively become a runtime > variable per virtio user. > > Reasons: > > (1) VIRTQUEUE_MAX_SIZE should reflect the absolute theoretical > maximum queue size possible. Which is actually the maximum > queue size allowed by the virtio protocol. The appropriate > value for VIRTQUEUE_MAX_SIZE would therefore be 32768: > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006 > > Apparently VIRTQUEUE_MAX_SIZE was instead defined with a > more or less arbitrary value of 1024 in the past, which > limits the maximum transfer size with virtio to 4M > (more precise: 1024 * PAGE_SIZE, with the latter typically > being 4k). > > (2) Additionally the current value of 1024 poses a hidden limit, > invisible to guest, which causes a system hang with the > following QEMU error if guest tries to exceed it: > > virtio: too many write descriptors in indirect table > > (3) Unfortunately not all virtio users in QEMU would currently > work correctly with the new value of 32768. > > So let's turn this hard coded global value into a runtime > variable as a first step in this commit, configurable for each > virtio user by passing a corresponding value with virtio_init() > call. > > Signed-off-by: Christian Schoenebeck > --- Reviewed-by: Greg Kurz > hw/9pfs/virtio-9p-device.c | 3 ++- > hw/block/vhost-user-blk.c | 2 +- > hw/block/virtio-blk.c | 3 ++- > hw/char/virtio-serial-bus.c| 2 +- > hw/display/virtio-gpu-base.c | 2 +- > hw/input/virtio-input.c| 2 +- > hw/net/virtio-net.c| 15 --- > hw/scsi/virtio-scsi.c | 2 +- > hw/virtio/vhost-user-fs.c | 2 +- > hw/virtio/vhost-user-i2c.c | 3 ++- > hw/virtio/vhost-vsock-common.c | 2 +- > hw/virtio/virtio-balloon.c | 4 ++-- > hw/virtio/virtio-crypto.c | 3 ++- > hw/virtio/virtio-iommu.c | 2 +- > hw/virtio/virtio-mem.c | 2 +- > hw/virtio/virtio-pmem.c| 2 +- > hw/virtio/virtio-rng.c | 2 +- > hw/virtio/virtio.c | 35 +++--- > include/hw/virtio/virtio.h | 5 - > 19 files changed, 57 insertions(+), 36 deletions(-) > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > index 54ee93b71f..cd5d95dd51 100644 > --- a/hw/9pfs/virtio-9p-device.c > +++ b/hw/9pfs/virtio-9p-device.c > @@ -216,7 +216,8 @@ static void virtio_9p_device_realize(DeviceState *dev, > Error **errp) > } > > v->config_size = sizeof(struct virtio_9p_config) + strlen(s->fsconf.tag); > -virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size); > +virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > +VIRTQUEUE_MAX_SIZE); > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > } > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index ba13cb87e5..336f56705c 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > *dev, Error **errp) > } > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > -sizeof(struct virtio_blk_config)); > +sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > > s->virtqs = g_new(VirtQueue *, s->num_queues); > for (i = 0; i < s->num_queues; i++) { > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index f139cd7cc9..9c0f46815c 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -1213,7 +1213,8 @@ static void virtio_blk_device_realize(DeviceState *dev, > Error **errp) > > virtio_blk_set_config_size(s, s->host_features); > > -virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size); > +virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size, > +VIRTQUEUE_MAX_SIZE); > > s->blk = conf->conf.blk; > s->rq = NULL; > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c > index f01ec2137c..9ad915 100644 > --- a/hw/char/virtio-serial-bus.c > +++ b/hw/char/virtio-serial-bus.c > @@ -1045,7 +1045,7 @@ static void virtio_serial_device_realize(DeviceState > *dev, Error **errp) > config_size = offsetof(struct virtio_console_config, emerg_wr); > } > virtio_init(vdev, "virtio-serial", VIRTIO_ID_CONSOLE, > -config_size); > +config_size, VIRTQUEUE_MAX_SIZE); > > /* Spawn a new virtio-serial bus on which the ports will ride as devices > */ > qbus_init(&vser->bus, sizeof(vser->bus), TYPE_VIRTIO_SERIAL_BUS, > diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c > index c8da4806e0..20b06a7adf 100644 > --- a/hw/display/virtio-gpu-base.c > +++ b/hw/display/virtio-gpu-base.c > @@ -171,7 +171,7 @@ virtio_gpu_ba
Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On 04.10.21 21:38, Christian Schoenebeck wrote: At the moment the maximum transfer size with virtio is limited to 4M (1024 * PAGE_SIZE). This series raises this limit to its maximum theoretical possible transfer size of 128M (32k pages) according to the virtio specs: https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006 I'm missing the "why do we care". Can you comment on that? -- Thanks, David / dhildenb
Re: Strange qemu6 regression cauing disabled usb controller.
On Thu, Sep 30, 2021 at 04:05:52PM +0100, Daniel P. Berrangé wrote: Co-incidentally we've just had another bug report filed today that suggests 7bed89958bfbf40df9ca681cefbdca63abdde39d as a buggy commit causing deadlock in QEMU https://gitlab.com/qemu-project/qemu/-/issues/650 Is opening a gitlab ticket the prefered way to report issues now ? Should i do that ? Thanks. Remy.
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 09:25:39AM +0200, Halil Pasic wrote: > On Mon, 4 Oct 2021 09:11:04 -0400 > "Michael S. Tsirkin" wrote: > > > > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) > > > >> { > > > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) > > > >> return virtio_is_big_endian(vdev); > > > >> #elif defined(TARGET_WORDS_BIGENDIAN) > > > >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { > > > >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ > > > >> return false; > > > >> } > > > >> return true; > > > >> #else > > > >> return false; > > > >> #endif > > > >> } > > > >> > > > > > > > > ok so that's a QEMU bug. Any virtio 1.0 and up > > > > compatible device must use LE. > > > > It can also present a legacy config space where the > > > > endian depends on the guest. > > > > > > So, how is the virtio core supposed to determine this? A > > > transport-specific callback? > > > > I'd say a field in VirtIODevice is easiest. > > Wouldn't a call from transport code into virtio core > be more handy? What I have in mind is stuff like vhost-user and vdpa. My > understanding is, that for vhost setups where the config is outside qemu, > we probably need a new command that tells the vhost backend what > endiannes to use for config. I don't think we can use > VHOST_USER_SET_VRING_ENDIAN because that one is on a virtqueue basis > according to the doc. So for vhost-user and similar we would fire that > command and probably also set the filed, while for devices for which > control plane is handled by QEMU we would just set the field. > > Does that sound about right? I'm fine either way, but when would you invoke this? With my idea backends can check the field when get_config is invoked. As for using this in VHOST, can we maybe re-use SET_FEATURES? Kind of hacky but nice in that it will actually make existing backends work... -- MST
Re: [PATCH V3] block/rbd: implement bdrv_co_block_status
On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven wrote: > > the qemu rbd driver currently lacks support for bdrv_co_block_status. > This results mainly in incorrect progress during block operations (e.g. > qemu-img convert with an rbd image as source). > > This patch utilizes the rbd_diff_iterate2 call from librbd to detect > allocated and unallocated (all zero areas). > > To avoid querying the ceph OSDs for the answer this is only done if > the image has the fast-diff feature which depends on the object-map and > exclusive-lock features. In this case it is guaranteed that the information > is present in memory in the librbd client and thus very fast. > > If fast-diff is not available all areas are reported to be allocated > which is the current behaviour if bdrv_co_block_status is not implemented. > > Signed-off-by: Peter Lieven > --- > V2->V3: > - check rbd_flags every time (they can change during runtime) [Ilya] > - also check for fast-diff invalid flag [Ilya] > - *map and *file cant be NULL [Ilya] > - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an > unallocated area [Ilya] > - typo: catched -> caught [Ilya] > - changed wording about fast-diff, object-map and exclusive lock in > commit msg [Ilya] > > V1->V2: > - add commit comment [Stefano] > - use failed_post_open [Stefano] > - remove redundant assert [Stefano] > - add macro+comment for the magic -9000 value [Stefano] > - always set *file if its non NULL [Stefano] > > block/rbd.c | 126 > 1 file changed, 126 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index dcf82b15b8..3cb24f9981 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -1259,6 +1259,131 @@ static ImageInfoSpecific > *qemu_rbd_get_specific_info(BlockDriverState *bs, > return spec_info; > } > > +typedef struct rbd_diff_req { > +uint64_t offs; > +uint64_t bytes; > +int exists; Hi Peter, Nit: make exists a bool. The one in the callback has to be an int because of the callback signature but let's not spread that. > +} rbd_diff_req; > + > +/* > + * rbd_diff_iterate2 allows to interrupt the exection by returning a negative > + * value in the callback routine. Choose a value that does not conflict with > + * an existing exitcode and return it if we want to prematurely stop the > + * execution because we detected a change in the allocation status. > + */ > +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 > + > +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len, > + int exists, void *opaque) > +{ > +struct rbd_diff_req *req = opaque; > + > +assert(req->offs + req->bytes <= offs); > + > +if (req->exists && offs > req->offs + req->bytes) { > +/* > + * we started in an allocated area and jumped over an unallocated > area, > + * req->bytes contains the length of the allocated area before the > + * unallocated area. stop further processing. > + */ > +return QEMU_RBD_EXIT_DIFF_ITERATE2; > +} > +if (req->exists && !exists) { > +/* > + * we started in an allocated area and reached a hole. req->bytes > + * contains the length of the allocated area before the hole. > + * stop further processing. > + */ > +return QEMU_RBD_EXIT_DIFF_ITERATE2; Do you have a test case for when this branch is taken? > +} > +if (!req->exists && exists && offs > req->offs) { > +/* > + * we started in an unallocated area and hit the first allocated > + * block. req->bytes must be set to the length of the unallocated > area > + * before the allocated area. stop further processing. > + */ > +req->bytes = offs - req->offs; > +return QEMU_RBD_EXIT_DIFF_ITERATE2; > +} > + > +/* > + * assert that we caught all cases above and allocation state has not > + * changed during callbacks. > + */ > +assert(exists == req->exists || !req->bytes); > +req->exists = exists; > + > +/* > + * assert that we either return an unallocated block or have got > callbacks > + * for all allocated blocks present. > + */ > +assert(!req->exists || offs == req->offs + req->bytes); > +req->bytes = offs + len - req->offs; > + > +return 0; > +} > + > +static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, > + bool want_zero, int64_t > offset, > + int64_t bytes, int64_t > *pnum, > + int64_t *map, > + BlockDriverState **file) > +{ > +BDRVRBDState *s = bs->opaque; > +int ret, r; Nit: I would rename ret to status or something like that to make it clear(er) that it is an actual value and never an error. Or, even better, drop it entirely and return one of the two bitm
[PATCH 0/3] vdpa: Check iova range on memory regions ops
At this moment vdpa will not send memory regions bigger than 1<<63. However, actual iova range could be way more restrictive than that. Since we can obtain the range through vdpa ioctl call, just save it from the beginning of the operation and check against it. Eugenio Pérez (3): vdpa: Skip protected ram IOMMU mappings vdpa: Add vhost_vdpa_section_end vdpa: Check for iova range at mappings changes include/hw/virtio/vhost-vdpa.h | 2 + hw/virtio/vhost-vdpa.c | 83 +- hw/virtio/trace-events | 1 + 3 files changed, 65 insertions(+), 21 deletions(-) -- 2.27.0
[PATCH 3/3] vdpa: Check for iova range at mappings changes
Check vdpa device range before updating memory regions so we don't add any outside of it, and report the invalid change if any. Signed-off-by: Eugenio Pérez --- include/hw/virtio/vhost-vdpa.h | 2 + hw/virtio/vhost-vdpa.c | 68 ++ hw/virtio/trace-events | 1 + 3 files changed, 55 insertions(+), 16 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index a8963da2d9..c288cf7ecb 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -13,6 +13,7 @@ #define HW_VIRTIO_VHOST_VDPA_H #include "hw/virtio/virtio.h" +#include "standard-headers/linux/vhost_types.h" typedef struct VhostVDPAHostNotifier { MemoryRegion mr; @@ -24,6 +25,7 @@ typedef struct vhost_vdpa { uint32_t msg_type; bool iotlb_batch_begin_sent; MemoryListener listener; +struct vhost_vdpa_iova_range iova_range; struct vhost_dev *dev; VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; } VhostVDPA; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index a1de6c7c9c..26d0258723 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -33,20 +33,34 @@ static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) return llend; } -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) -{ -return (!memory_region_is_ram(section->mr) && -!memory_region_is_iommu(section->mr)) || -memory_region_is_protected(section->mr) || - /* vhost-vDPA doesn't allow MMIO to be mapped */ -memory_region_is_ram_device(section->mr) || - /* -* Sizing an enabled 64-bit BAR can cause spurious mappings to -* addresses in the upper part of the 64-bit address space. These -* are never accessed by the CPU and beyond the address width of -* some IOMMU hardware. TODO: VDPA should tell us the IOMMU width. -*/ - section->offset_within_address_space & (1ULL << 63); +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section, +uint64_t iova_min, +uint64_t iova_max) +{ +Int128 llend; +bool r = (!memory_region_is_ram(section->mr) && + !memory_region_is_iommu(section->mr)) || + memory_region_is_protected(section->mr) || + /* vhost-vDPA doesn't allow MMIO to be mapped */ + memory_region_is_ram_device(section->mr); +if (r) { +return true; +} + +if (section->offset_within_address_space < iova_min) { +error_report("RAM section out of device range (min=%lu, addr=%lu)", + iova_min, section->offset_within_address_space); +return true; +} + +llend = vhost_vdpa_section_end(section); +if (int128_make64(llend) > iova_max) { +error_report("RAM section out of device range (max=%lu, end addr=%lu)", + iova_max, (uint64_t)int128_make64(llend)); +return true; +} + +return false; } static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, @@ -158,7 +172,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener, void *vaddr; int ret; -if (vhost_vdpa_listener_skipped_section(section)) { +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, +v->iova_range.last)) { return; } @@ -216,7 +231,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener, Int128 llend, llsize; int ret; -if (vhost_vdpa_listener_skipped_section(section)) { +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, +v->iova_range.last)) { return; } @@ -284,9 +300,24 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status) vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s); } +static int vhost_vdpa_get_iova_range(struct vhost_vdpa *v) +{ +int ret; + +ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE, &v->iova_range); +if (ret != 0) { +return ret; +} + +trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first, +v->iova_range.last); +return ret; +} + static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) { struct vhost_vdpa *v; +int r; assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA); trace_vhost_vdpa_init(dev, opaque); @@ -296,6 +327,11 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) v->listener = vhost_vdpa_memory_listener; v->msg_type = VHOST_IOTLB_MSG_V2; +r = vhost_vdpa_get_iova_range(v); +if (unlikely(!r)) { +return r; +} + vhost_vdpa_add
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On 05/10/2021 17:42, Thomas Huth wrote: On 05/10/2021 08.18, Alexey Kardashevskiy wrote: On 05/10/2021 15:44, Christophe Leroy wrote: Le 05/10/2021 à 02:48, David Gibson a écrit : On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: On 01/10/2021 15.04, Christophe Leroy wrote: Le 01/10/2021 à 14:04, Thomas Huth a écrit : On 01/10/2021 13.12, Peter Maydell wrote: On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: Nevertheless, as long as nobody has a hint where to find that ppc405_rom.bin, I think both boards are pretty useless in QEMU (as far as I can see, they do not work without the bios at all, so it's also not possible to use a Linux image with the "-kernel" CLI option directly). It is at least in theory possible to run bare-metal code on either board, by passing either a pflash or a bios argument. True. I did some more research, and seems like there was once support for those boards in u-boot, but it got removed there a couple of years ago already: https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 But I agree that there seem to be no signs of anybody actually successfully using these boards for anything, so we should deprecate-and-delete them. Yes, let's mark them as deprecated now ... if someone still uses them and speaks up, we can still revert the deprecation again. I really would like to be able to use them to validate Linux Kernel changes, hence looking for that missing BIOS. If we remove ppc405 from QEMU, we won't be able to do any regression tests of Linux Kernel on those processors. If you/someone managed to compile an old version of u-boot for one of these two boards, so that we would finally have something for regression testing, we can of course also keep the boards in QEMU... I can see that it would be usefor for some cases, but unless someone volunteers to track down the necessary firmware and look after it, I think we do need to deprecate it - I certainly don't have the capacity to look into this. I will look at it, please allow me a few weeks though. Well, building it was not hard but now I'd like to know what board QEMU actually emulates, there are way too many codenames and PVRs. Here is what I was building: https://github.com/aik/u-boot/tree/ppc4xx-qemu CONFIG_SYS_ARCH="powerpc" CONFIG_SYS_CPU="ppc4xx" CONFIG_SYS_VENDOR="esd" CONFIG_SYS_BOARD="pmc405de" CONFIG_SYS_CONFIG_NAME="PMC405DE" Is this any use? If I've got u-boot commit 98f705c9cefdfdba62c069821bbba10273a0a8 right, there used to be SYS_BOARD="405ep" config before that removal, so that sounds like a promising match for the ref405ep of QEMU? Tricky. The board can be 405ep if TARGET_IO/TARGET_DLVISION/TARGET_DLVISION_10G selected. Neither compiles at 98f705c9cefdfdba62c^ due to missing CONFIG_SYS_PCI_PTM1PCI :-/ The support for "taihu" even got removed earlier, in u-boot commit 123b6cd7a4f75536734a7bff97db6eebce614bd1 , and the commit message says that it did not compile anymore at the end, so you might need to check out an even older version for that one. What is so special about taihu? -- Alexey
[PATCH 1/3] vdpa: Skip protected ram IOMMU mappings
Following the logic of commit 56918a126ae ("memory: Add RAM_PROTECTED flag to skip IOMMU mappings") with VFIO, skip memory sections inaccessible via normal mechanisms, including DMA. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 47d7a5a23d..ea1aa71ad8 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -28,6 +28,7 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) { return (!memory_region_is_ram(section->mr) && !memory_region_is_iommu(section->mr)) || +memory_region_is_protected(section->mr) || /* vhost-vDPA doesn't allow MMIO to be mapped */ memory_region_is_ram_device(section->mr) || /* -- 2.27.0
Re: [PATCH] MAINTAINERS: Split HPPA TCG vs HPPA machines/hardware
On 10/4/21 10:38, Philippe Mathieu-Daudé wrote: > Hardware emulated models don't belong to the TCG MAINTAINERS > section. Move them to the 'HP-PARISC Machines' section. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Helge Deller > --- > MAINTAINERS | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 50435b8d2f5..002620c6cad 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -205,10 +205,7 @@ HPPA (PA-RISC) TCG CPUs > M: Richard Henderson > S: Maintained > F: target/hppa/ > -F: hw/hppa/ > F: disas/hppa.c > -F: hw/net/*i82596* > -F: include/hw/net/lasi_82596.h > > M68K TCG CPUs > M: Laurent Vivier > @@ -1098,6 +1095,8 @@ R: Helge Deller > S: Odd Fixes > F: configs/devices/hppa-softmmu/default.mak > F: hw/hppa/ > +F: hw/net/*i82596* > +F: include/hw/net/lasi_82596.h > F: pc-bios/hppa-firmware.img > > M68K Machines >
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On 05/10/2021 10.05, Alexey Kardashevskiy wrote: On 05/10/2021 17:42, Thomas Huth wrote: On 05/10/2021 08.18, Alexey Kardashevskiy wrote: On 05/10/2021 15:44, Christophe Leroy wrote: Le 05/10/2021 à 02:48, David Gibson a écrit : On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: On 01/10/2021 15.04, Christophe Leroy wrote: Le 01/10/2021 à 14:04, Thomas Huth a écrit : On 01/10/2021 13.12, Peter Maydell wrote: On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: Nevertheless, as long as nobody has a hint where to find that ppc405_rom.bin, I think both boards are pretty useless in QEMU (as far as I can see, they do not work without the bios at all, so it's also not possible to use a Linux image with the "-kernel" CLI option directly). It is at least in theory possible to run bare-metal code on either board, by passing either a pflash or a bios argument. True. I did some more research, and seems like there was once support for those boards in u-boot, but it got removed there a couple of years ago already: https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 But I agree that there seem to be no signs of anybody actually successfully using these boards for anything, so we should deprecate-and-delete them. Yes, let's mark them as deprecated now ... if someone still uses them and speaks up, we can still revert the deprecation again. I really would like to be able to use them to validate Linux Kernel changes, hence looking for that missing BIOS. If we remove ppc405 from QEMU, we won't be able to do any regression tests of Linux Kernel on those processors. If you/someone managed to compile an old version of u-boot for one of these two boards, so that we would finally have something for regression testing, we can of course also keep the boards in QEMU... I can see that it would be usefor for some cases, but unless someone volunteers to track down the necessary firmware and look after it, I think we do need to deprecate it - I certainly don't have the capacity to look into this. I will look at it, please allow me a few weeks though. Well, building it was not hard but now I'd like to know what board QEMU actually emulates, there are way too many codenames and PVRs. Here is what I was building: https://github.com/aik/u-boot/tree/ppc4xx-qemu CONFIG_SYS_ARCH="powerpc" CONFIG_SYS_CPU="ppc4xx" CONFIG_SYS_VENDOR="esd" CONFIG_SYS_BOARD="pmc405de" CONFIG_SYS_CONFIG_NAME="PMC405DE" Is this any use? If I've got u-boot commit 98f705c9cefdfdba62c069821bbba10273a0a8 right, there used to be SYS_BOARD="405ep" config before that removal, so that sounds like a promising match for the ref405ep of QEMU? Tricky. The board can be 405ep if TARGET_IO/TARGET_DLVISION/TARGET_DLVISION_10G selected. Neither compiles at 98f705c9cefdfdba62c^ due to missing CONFIG_SYS_PCI_PTM1PCI :-/ The support for "taihu" even got removed earlier, in u-boot commit 123b6cd7a4f75536734a7bff97db6eebce614bd1 , and the commit message says that it did not compile anymore at the end, so you might need to check out an even older version for that one. What is so special about taihu? taihu is the other 405 board defined in hw/ppc/ppc405_boards.c (which I suggested to deprecate now) Thomas
[PATCH 2/3] vdpa: Add vhost_vdpa_section_end
Abstract this operation, that will be reused when validating the region against the iova range that the device supports. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-vdpa.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index ea1aa71ad8..a1de6c7c9c 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -24,6 +24,15 @@ #include "trace.h" #include "qemu-common.h" +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) +{ +Int128 llend = int128_make64(section->offset_within_address_space); +llend = int128_add(llend, section->size); +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + +return llend; +} + static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) { return (!memory_region_is_ram(section->mr) && @@ -160,10 +169,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener, } iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); -llend = int128_make64(section->offset_within_address_space); -llend = int128_add(llend, section->size); -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); - +llend = vhost_vdpa_section_end(section); if (int128_ge(int128_make64(iova), llend)) { return; } @@ -221,9 +227,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener, } iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); -llend = int128_make64(section->offset_within_address_space); -llend = int128_add(llend, section->size); -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); +llend = vhost_vdpa_section_end(section); trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); -- 2.27.0
Re: [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
On Mon, Oct 04, 2021 at 03:58:09PM -0400, Vivek Goyal wrote: > On Mon, Oct 04, 2021 at 02:54:17PM +0100, Stefan Hajnoczi wrote: > > On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote: > > > Add helpers to create/cleanup virtuqueues and use those helpers. I will > > > > s/virtuqueues/virtqueues/ > > > > > need to reconfigure queues in later patches and using helpers will allow > > > reusing the code. > > > > > > Signed-off-by: Vivek Goyal > > > --- > > > hw/virtio/vhost-user-fs.c | 87 +++ > > > 1 file changed, 52 insertions(+), 35 deletions(-) > > > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > index c595957983..d1efbc5b18 100644 > > > --- a/hw/virtio/vhost-user-fs.c > > > +++ b/hw/virtio/vhost-user-fs.c > > > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, > > > uint8_t status) > > > } > > > } > > > > > > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq) > > > +{ > > > +/* > > > + * Not normally called; it's the daemon that handles the queue; > > > + * however virtio's cleanup path can call this. > > > + */ > > > +} > > > + > > > +static void vuf_create_vqs(VirtIODevice *vdev) > > > +{ > > > +VHostUserFS *fs = VHOST_USER_FS(vdev); > > > +unsigned int i; > > > + > > > +/* Hiprio queue */ > > > +fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, > > > + vuf_handle_output); > > > + > > > +/* Request queues */ > > > +fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues); > > > +for (i = 0; i < fs->conf.num_request_queues; i++) { > > > +fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, > > > + vuf_handle_output); > > > +} > > > + > > > +/* 1 high prio queue, plus the number configured */ > > > +fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues; > > > +fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, > > > fs->vhost_dev.nvqs); > > > > These two lines prepare for vhost_dev_init(), so moving them here is > > debatable. If a caller is going to use this function again in the future > > then they need to be sure to also call vhost_dev_init(). For now it > > looks safe, so I guess it's okay. > > Hmm..., I do call this function later from vuf_set_features() and > reconfigure the queues. I see that I don't call vhost_dev_init() > in that path. I am not even sure if I should be calling > vhost_dev_init() from inside vuf_set_features(). > > So core reuirement is that at the time of first creating device > I have no idea if driver supports notification queue or not. So > I do create device with notification queue. But later if driver > (and possibly vhost device) does not support notifiation queue, > then we need to reconfigure queues. What's the correct way to > do that? Ah, I see. The simplest approach is to always allocate the maximum number of virtqueues. QEMU's vhost-user-fs device shouldn't need to worry about which virtqueues are actually in use. Let virtiofsd (the vhost-user backend) worry about that. I posted ideas about how to do that in a reply to another patch in this series. I can't guarantee it will work, but I think it's worth exploring. Stefan signature.asc Description: PGP signature
[PATCH v1 1/2] migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread
init_dirty_bitmap_migration assumes the iothread lock (BQL) to be held, but instead it isn't. Instead of adding the lock to qemu_savevm_state_setup(), follow the same pattern as the other ->save_setup callbacks and lock+unlock inside dirty_bitmap_save_setup(). Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Stefan Hajnoczi --- migration/block-dirty-bitmap.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c index 35f5ef688d..9aba7d9c22 100644 --- a/migration/block-dirty-bitmap.c +++ b/migration/block-dirty-bitmap.c @@ -1215,7 +1215,10 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) { DBMSaveState *s = &((DBMState *)opaque)->save; SaveBitmapState *dbms = NULL; + +qemu_mutex_lock_iothread(); if (init_dirty_bitmap_migration(s) < 0) { +qemu_mutex_unlock_iothread(); return -1; } @@ -1223,7 +1226,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) send_bitmap_start(f, s, dbms); } qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS); - +qemu_mutex_unlock_iothread(); return 0; } -- 2.27.0
Re: [PATCH 2/3] vdpa: Add vhost_vdpa_section_end
On Tue, Oct 05, 2021 at 10:01:30AM +0200, Eugenio Pérez wrote: > Abstract this operation, that will be reused when validating the region > against the iova range that the device supports. > > Signed-off-by: Eugenio Pérez Note that as defined end is actually 1 byte beyond end of section. As such it can e.g. overflow if cast to u64. So be careful to use int128 ops with it. Also - document? > --- > hw/virtio/vhost-vdpa.c | 18 +++--- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index ea1aa71ad8..a1de6c7c9c 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -24,6 +24,15 @@ > #include "trace.h" > #include "qemu-common.h" > > +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) > +{ > +Int128 llend = int128_make64(section->offset_within_address_space); > +llend = int128_add(llend, section->size); > +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > + > +return llend; > +} > + > static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) > { > return (!memory_region_is_ram(section->mr) && > @@ -160,10 +169,7 @@ static void > vhost_vdpa_listener_region_add(MemoryListener *listener, > } > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > -llend = int128_make64(section->offset_within_address_space); > -llend = int128_add(llend, section->size); > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > - > +llend = vhost_vdpa_section_end(section); > if (int128_ge(int128_make64(iova), llend)) { > return; > } > @@ -221,9 +227,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener > *listener, > } > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > -llend = int128_make64(section->offset_within_address_space); > -llend = int128_add(llend, section->size); > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > +llend = vhost_vdpa_section_end(section); > > trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); > > -- > 2.27.0
Re: [PATCH 08/13] virtiofsd: Create a notification queue
On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote: > On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote: > > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote: > > > Add a notification queue which will be used to send async notifications > > > for file lock availability. > > > > > > Signed-off-by: Vivek Goyal > > > Signed-off-by: Ioannis Angelakopoulos > > > --- > > > hw/virtio/vhost-user-fs-pci.c | 4 +- > > > hw/virtio/vhost-user-fs.c | 62 +-- > > > include/hw/virtio/vhost-user-fs.h | 2 + > > > tools/virtiofsd/fuse_i.h | 1 + > > > tools/virtiofsd/fuse_virtio.c | 70 +++ > > > 5 files changed, 116 insertions(+), 23 deletions(-) > > > > > > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c > > > index 2ed8492b3f..cdb9471088 100644 > > > --- a/hw/virtio/vhost-user-fs-pci.c > > > +++ b/hw/virtio/vhost-user-fs-pci.c > > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy > > > *vpci_dev, Error **errp) > > > DeviceState *vdev = DEVICE(&dev->vdev); > > > > > > if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) { > > > -/* Also reserve config change and hiprio queue vectors */ > > > -vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2; > > > +/* Also reserve config change, hiprio and notification queue > > > vectors */ > > > +vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3; > > > } > > > > > > qdev_realize(vdev, BUS(&vpci_dev->bus), errp); > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > index d1efbc5b18..6bafcf0243 100644 > > > --- a/hw/virtio/vhost-user-fs.c > > > +++ b/hw/virtio/vhost-user-fs.c > > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = { > > > VIRTIO_F_NOTIFY_ON_EMPTY, > > > VIRTIO_F_RING_PACKED, > > > VIRTIO_F_IOMMU_PLATFORM, > > > +VIRTIO_FS_F_NOTIFICATION, > > > > > > VHOST_INVALID_FEATURE_BIT > > > }; > > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, > > > VirtQueue *vq) > > > */ > > > } > > > > > > -static void vuf_create_vqs(VirtIODevice *vdev) > > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq) > > > { > > > VHostUserFS *fs = VHOST_USER_FS(vdev); > > > unsigned int i; > > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev) > > > /* Hiprio queue */ > > > fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, > > > vuf_handle_output); > > > +/* > > > + * Notification queue. Feature negotiation happens later. So at this > > > + * point of time we don't know if driver will use notification queue > > > + * or not. > > > + */ > > > +if (notification_vq) { > > > +fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size, > > > + vuf_handle_output); > > > +} > > > > > > /* Request queues */ > > > fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues); > > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev) > > >vuf_handle_output); > > > } > > > > > > -/* 1 high prio queue, plus the number configured */ > > > -fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues; > > > +/* 1 high prio queue, 1 notification queue plus the number > > > configured */ > > > +if (notification_vq) { > > > +fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues; > > > +} else { > > > +fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues; > > > +} > > > fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, > > > fs->vhost_dev.nvqs); > > > } > > > > > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev) > > > virtio_delete_queue(fs->hiprio_vq); > > > fs->hiprio_vq = NULL; > > > > > > +if (fs->notification_vq) { > > > +virtio_delete_queue(fs->notification_vq); > > > +} > > > +fs->notification_vq = NULL; > > > + > > > for (i = 0; i < fs->conf.num_request_queues; i++) { > > > virtio_delete_queue(fs->req_vqs[i]); > > > } > > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev, > > > { > > > VHostUserFS *fs = VHOST_USER_FS(vdev); > > > > > > +virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION); > > > + > > > return vhost_get_features(&fs->vhost_dev, user_feature_bits, > > > features); > > > } > > > > > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features) > > > +{ > > > +VHostUserFS *fs = VHOST_USER_FS(vdev); > > > + > > > +if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) { > > > +fs->notify_enabled = true; > > > +/* > > > + * If guest first booted with no notification queue support and > > > +
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On 10/5/21 08:18, Alexey Kardashevskiy wrote: On 05/10/2021 15:44, Christophe Leroy wrote: Le 05/10/2021 à 02:48, David Gibson a écrit : On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: On 01/10/2021 15.04, Christophe Leroy wrote: Le 01/10/2021 à 14:04, Thomas Huth a écrit : On 01/10/2021 13.12, Peter Maydell wrote: On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: Nevertheless, as long as nobody has a hint where to find that ppc405_rom.bin, I think both boards are pretty useless in QEMU (as far as I can see, they do not work without the bios at all, so it's also not possible to use a Linux image with the "-kernel" CLI option directly). It is at least in theory possible to run bare-metal code on either board, by passing either a pflash or a bios argument. True. I did some more research, and seems like there was once support for those boards in u-boot, but it got removed there a couple of years ago already: https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 But I agree that there seem to be no signs of anybody actually successfully using these boards for anything, so we should deprecate-and-delete them. Yes, let's mark them as deprecated now ... if someone still uses them and speaks up, we can still revert the deprecation again. I really would like to be able to use them to validate Linux Kernel changes, hence looking for that missing BIOS. If we remove ppc405 from QEMU, we won't be able to do any regression tests of Linux Kernel on those processors. If you/someone managed to compile an old version of u-boot for one of these two boards, so that we would finally have something for regression testing, we can of course also keep the boards in QEMU... I can see that it would be usefor for some cases, but unless someone volunteers to track down the necessary firmware and look after it, I think we do need to deprecate it - I certainly don't have the capacity to look into this. I will look at it, please allow me a few weeks though. Well, building it was not hard but now I'd like to know what board QEMU actually emulates, there are way too many codenames and PVRs. yes. We should try to reduce the list below. Deprecating embedded machines is one way. C. $ ./install/bin/qemu-system-ppc -cpu ? PowerPC 601_v0 PVR 00010001 PowerPC 601_v1 PVR 00010001 PowerPC 601_v2 PVR 00010002 PowerPC 601 (alias for 601_v2) PowerPC 601v (alias for 601_v2) PowerPC 603 PVR 00030100 PowerPC mpc8240 (alias for 603) PowerPC vanilla (alias for 603) PowerPC 604 PVR 00040103 PowerPC ppc32(alias for 604) PowerPC ppc (alias for 604) PowerPC default (alias for 604) PowerPC 602 PVR 00050100 PowerPC 603e_v1.1PVR 00060101 PowerPC 603e_v1.2PVR 00060102 PowerPC 603e_v1.3PVR 00060103 PowerPC 603e_v1.4PVR 00060104 PowerPC 603e_v2.2PVR 00060202 PowerPC 603e_v3 PVR 00060300 PowerPC 603e_v4 PVR 00060400 PowerPC 603e_v4.1PVR 00060401 PowerPC 603e (alias for 603e_v4.1) PowerPC stretch (alias for 603e_v4.1) PowerPC 603p PVR 0007 PowerPC 603e7v PVR 00070100 PowerPC vaillant (alias for 603e7v) PowerPC 603e7v1 PVR 00070101 PowerPC 603e7PVR 00070200 PowerPC 603e7v2 PVR 00070201 PowerPC 603e7t PVR 00071201 PowerPC 603r (alias for 603e7t) PowerPC goldeneye(alias for 603e7t) PowerPC 740_v1.0 PVR 00080100 PowerPC 740e PVR 00080100 PowerPC 750_v1.0 PVR 00080100 PowerPC 750_v2.0 PVR 00080200 PowerPC 740_v2.0 PVR 00080200 PowerPC 750e PVR 00080200 PowerPC 750_v2.1 PVR 00080201 PowerPC 740_v2.1 PVR 00080201 PowerPC 750_v2.2 PVR 00080202 PowerPC 740_v2.2 PVR 00080202 PowerPC 750_v3.0 PVR 00080300 PowerPC 740_v3.0 PVR 00080300 PowerPC 750_v3.1 PVR 00080301 PowerPC 750 (alias for 750_v3.1) PowerPC typhoon (alias for 750_v3.1) PowerPC g3 (alias for 750_v3.1) PowerPC 740_v3.1 PVR 00080301 PowerPC 740 (alias for 740_v3.1) PowerPC arthur (alias for 740_v3.1) PowerPC 750cx_v1.0 PVR 00082100 PowerPC 750cx_v2.0 PVR 00082200 PowerPC 750cx_v2.1 PVR 00082201 PowerPC 750cx_v2.2 PVR 00082202 PowerPC 750cx(alias for 750cx_v2.2) PowerPC 750cxe_v2.1 PVR 00082211 PowerPC 750cxe_v2.2 PVR 00082212 PowerPC 750cxe_v2.3 PVR 00082213 PowerPC 750cxe_v2.4 PVR 00082214 PowerPC 750cxe_v3.0 PVR 00082310 PowerPC 750cxe_v3.1 PVR 00082311 PowerPC 745_v1.0 PVR 00083100 PowerPC 755_v1.0 PVR 00083100 PowerPC 755_v1.1 PVR 00083101 Power
[PATCH v1 0/2] Migration: fix missing iothread locking
Some functions (in this case qemu_savevm_state_complete_postcopy() and init_dirty_bitmap_migration()) assume and document that qemu_mutex_lock_iothread() is hold. This seems to have been forgotten in some places, and this series aims to fix that. Patch 1 was part of my RFC block layer series "block layer: split block APIs in graph and I/O" but I decided to do a separate series for these two bugs, as they are independent from the API split. Signed-off-by: Emanuele Giuseppe Esposito Emanuele Giuseppe Esposito (2): migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread migration: add missing qemu_mutex_lock_iothread in migration_completion migration/block-dirty-bitmap.c | 5 - migration/migration.c | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) -- 2.27.0
Re: [PATCH V3] block/rbd: implement bdrv_co_block_status
Am 05.10.21 um 09:54 schrieb Ilya Dryomov: On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven wrote: the qemu rbd driver currently lacks support for bdrv_co_block_status. This results mainly in incorrect progress during block operations (e.g. qemu-img convert with an rbd image as source). This patch utilizes the rbd_diff_iterate2 call from librbd to detect allocated and unallocated (all zero areas). To avoid querying the ceph OSDs for the answer this is only done if the image has the fast-diff feature which depends on the object-map and exclusive-lock features. In this case it is guaranteed that the information is present in memory in the librbd client and thus very fast. If fast-diff is not available all areas are reported to be allocated which is the current behaviour if bdrv_co_block_status is not implemented. Signed-off-by: Peter Lieven --- V2->V3: - check rbd_flags every time (they can change during runtime) [Ilya] - also check for fast-diff invalid flag [Ilya] - *map and *file cant be NULL [Ilya] - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an unallocated area [Ilya] - typo: catched -> caught [Ilya] - changed wording about fast-diff, object-map and exclusive lock in commit msg [Ilya] V1->V2: - add commit comment [Stefano] - use failed_post_open [Stefano] - remove redundant assert [Stefano] - add macro+comment for the magic -9000 value [Stefano] - always set *file if its non NULL [Stefano] block/rbd.c | 126 1 file changed, 126 insertions(+) diff --git a/block/rbd.c b/block/rbd.c index dcf82b15b8..3cb24f9981 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -1259,6 +1259,131 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs, return spec_info; } +typedef struct rbd_diff_req { +uint64_t offs; +uint64_t bytes; +int exists; Hi Peter, Nit: make exists a bool. The one in the callback has to be an int because of the callback signature but let's not spread that. +} rbd_diff_req; + +/* + * rbd_diff_iterate2 allows to interrupt the exection by returning a negative + * value in the callback routine. Choose a value that does not conflict with + * an existing exitcode and return it if we want to prematurely stop the + * execution because we detected a change in the allocation status. + */ +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 + +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len, + int exists, void *opaque) +{ +struct rbd_diff_req *req = opaque; + +assert(req->offs + req->bytes <= offs); + +if (req->exists && offs > req->offs + req->bytes) { +/* + * we started in an allocated area and jumped over an unallocated area, + * req->bytes contains the length of the allocated area before the + * unallocated area. stop further processing. + */ +return QEMU_RBD_EXIT_DIFF_ITERATE2; +} +if (req->exists && !exists) { +/* + * we started in an allocated area and reached a hole. req->bytes + * contains the length of the allocated area before the hole. + * stop further processing. + */ +return QEMU_RBD_EXIT_DIFF_ITERATE2; Do you have a test case for when this branch is taken? That would happen if you diff from a snapshot, the question is if it can also happen if the image is a clone from a snapshot? +} +if (!req->exists && exists && offs > req->offs) { +/* + * we started in an unallocated area and hit the first allocated + * block. req->bytes must be set to the length of the unallocated area + * before the allocated area. stop further processing. + */ +req->bytes = offs - req->offs; +return QEMU_RBD_EXIT_DIFF_ITERATE2; +} + +/* + * assert that we caught all cases above and allocation state has not + * changed during callbacks. + */ +assert(exists == req->exists || !req->bytes); +req->exists = exists; + +/* + * assert that we either return an unallocated block or have got callbacks + * for all allocated blocks present. + */ +assert(!req->exists || offs == req->offs + req->bytes); +req->bytes = offs + len - req->offs; + +return 0; +} + +static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, + bool want_zero, int64_t offset, + int64_t bytes, int64_t *pnum, + int64_t *map, + BlockDriverState **file) +{ +BDRVRBDState *s = bs->opaque; +int ret, r; Nit: I would rename ret to status or something like that to make it clear(er) that it is an actual value and never an error. Or, even better, drop it entirely and return one of the two bitmasks directly. +struct rbd_diff_req req = { .offs =
Re: [PATCH v4 2/3] docs: (further) remove non-reference uses of single backticks
On 10/4/21 23:52, John Snow wrote: The series rotted already. Here's the new changes. Signed-off-by: John Snow Reviewed-by: Damien Hedde --- docs/system/i386/sgx.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst index f103ae2a2fd..9aa161af1a1 100644 --- a/docs/system/i386/sgx.rst +++ b/docs/system/i386/sgx.rst @@ -77,9 +77,9 @@ CPUID Due to its myriad dependencies, SGX is currently not listed as supported in any of Qemu's built-in CPU configuration. To expose SGX (and SGX Launch -Control) to a guest, you must either use `-cpu host` to pass-through the +Control) to a guest, you must either use ``-cpu host`` to pass-through the host CPU model, or explicitly enable SGX when using a built-in CPU model, -e.g. via `-cpu ,+sgx` or `-cpu ,+sgx,+sgxlc`. +e.g. via ``-cpu ,+sgx`` or ``-cpu ,+sgx,+sgxlc``. All SGX sub-features enumerated through CPUID, e.g. SGX2, MISCSELECT, ATTRIBUTES, etc... can be restricted via CPUID flags. Be aware that enforcing @@ -126,7 +126,7 @@ creating VM with SGX. Feature Control ~~~ -Qemu SGX updates the `etc/msr_feature_control` fw_cfg entry to set the SGX +Qemu SGX updates the ``etc/msr_feature_control`` fw_cfg entry to set the SGX (bit 18) and SGX LC (bit 17) flags based on their respective CPUID support, i.e. existing guest firmware will automatically set SGX and SGX LC accordingly, assuming said firmware supports fw_cfg.msr_feature_control.
[PATCH v1 2/2] migration: add missing qemu_mutex_lock_iothread in migration_completion
qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL) to be held, but instead it isn't. Signed-off-by: Emanuele Giuseppe Esposito --- migration/migration.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 041b8451a6..215d5281f2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3182,7 +3182,10 @@ static void migration_completion(MigrationState *s) } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { trace_migration_completion_postcopy_end(); +qemu_mutex_lock_iothread(); qemu_savevm_state_complete_postcopy(s->to_dst_file); +qemu_mutex_unlock_iothread(); + trace_migration_completion_postcopy_end_after_complete(); } else if (s->state == MIGRATION_STATUS_CANCELLING) { goto fail; -- 2.27.0
Re: [PATCH] qapi: Make some ObjectTypes depend on the build settings
Thomas Huth writes: > Some of the ObjectType entries already depend on CONFIG_* switches. > Some others also only make sense with certain configurations, but > are currently always listed in the ObjectType enum. Let's make them > depend on the correpsonding CONFIG_* switches, too, so that upper > layers (like libvirt) have a better way to determine which features > are available in QEMU. > > Signed-off-by: Thomas Huth All these look good to me. I didn't look for more. Reviewed-by: Markus Armbruster
Re: [PATCH 3/3] vdpa: Check for iova range at mappings changes
On Tue, Oct 05, 2021 at 10:01:31AM +0200, Eugenio Pérez wrote: > Check vdpa device range before updating memory regions so we don't add > any outside of it, and report the invalid change if any. > > Signed-off-by: Eugenio Pérez > --- > include/hw/virtio/vhost-vdpa.h | 2 + > hw/virtio/vhost-vdpa.c | 68 ++ > hw/virtio/trace-events | 1 + > 3 files changed, 55 insertions(+), 16 deletions(-) > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > index a8963da2d9..c288cf7ecb 100644 > --- a/include/hw/virtio/vhost-vdpa.h > +++ b/include/hw/virtio/vhost-vdpa.h > @@ -13,6 +13,7 @@ > #define HW_VIRTIO_VHOST_VDPA_H > > #include "hw/virtio/virtio.h" > +#include "standard-headers/linux/vhost_types.h" > > typedef struct VhostVDPAHostNotifier { > MemoryRegion mr; > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa { > uint32_t msg_type; > bool iotlb_batch_begin_sent; > MemoryListener listener; > +struct vhost_vdpa_iova_range iova_range; > struct vhost_dev *dev; > VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; > } VhostVDPA; > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index a1de6c7c9c..26d0258723 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -33,20 +33,34 @@ static Int128 vhost_vdpa_section_end(const > MemoryRegionSection *section) > return llend; > } > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) > -{ > -return (!memory_region_is_ram(section->mr) && > -!memory_region_is_iommu(section->mr)) || > -memory_region_is_protected(section->mr) || > - /* vhost-vDPA doesn't allow MMIO to be mapped */ > -memory_region_is_ram_device(section->mr) || > - /* > -* Sizing an enabled 64-bit BAR can cause spurious mappings to > -* addresses in the upper part of the 64-bit address space. These > -* are never accessed by the CPU and beyond the address width of > -* some IOMMU hardware. TODO: VDPA should tell us the IOMMU > width. > -*/ > - section->offset_within_address_space & (1ULL << 63); > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section, > +uint64_t iova_min, > +uint64_t iova_max) > +{ > +Int128 llend; > +bool r = (!memory_region_is_ram(section->mr) && > + !memory_region_is_iommu(section->mr)) || > + memory_region_is_protected(section->mr) || > + /* vhost-vDPA doesn't allow MMIO to be mapped */ > + memory_region_is_ram_device(section->mr); > +if (r) { > +return true; > +} > + > +if (section->offset_within_address_space < iova_min) { > +error_report("RAM section out of device range (min=%lu, addr=%lu)", > + iova_min, section->offset_within_address_space); > +return true; > +} > + > +llend = vhost_vdpa_section_end(section); > +if (int128_make64(llend) > iova_max) { I am puzzled by this. You are taking a Int128, converting to u64, converting back to Int128, and comparing to u64. Head spins. What is all this back and forth trying to achieve? > +error_report("RAM section out of device range (max=%lu, end > addr=%lu)", > + iova_max, (uint64_t)int128_make64(llend)); > +return true; > +} > + > +return false; > } > > static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, > @@ -158,7 +172,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener > *listener, > void *vaddr; > int ret; > > -if (vhost_vdpa_listener_skipped_section(section)) { > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, > +v->iova_range.last)) { > return; > } > > @@ -216,7 +231,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener > *listener, > Int128 llend, llsize; > int ret; > > -if (vhost_vdpa_listener_skipped_section(section)) { > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, > +v->iova_range.last)) { > return; > } > > @@ -284,9 +300,24 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, > uint8_t status) > vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s); > } > > +static int vhost_vdpa_get_iova_range(struct vhost_vdpa *v) > +{ > +int ret; > + > +ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE, &v->iova_range); > +if (ret != 0) { > +return ret; > +} > + > +trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first, > +v->iova_range.last); > +return ret; > +} > + > static int vhost_vdpa_init(stru
Re: [PATCH V3] block/rbd: implement bdrv_co_block_status
On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven wrote: > > Am 05.10.21 um 09:54 schrieb Ilya Dryomov: > > On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven wrote: > >> the qemu rbd driver currently lacks support for bdrv_co_block_status. > >> This results mainly in incorrect progress during block operations (e.g. > >> qemu-img convert with an rbd image as source). > >> > >> This patch utilizes the rbd_diff_iterate2 call from librbd to detect > >> allocated and unallocated (all zero areas). > >> > >> To avoid querying the ceph OSDs for the answer this is only done if > >> the image has the fast-diff feature which depends on the object-map and > >> exclusive-lock features. In this case it is guaranteed that the information > >> is present in memory in the librbd client and thus very fast. > >> > >> If fast-diff is not available all areas are reported to be allocated > >> which is the current behaviour if bdrv_co_block_status is not implemented. > >> > >> Signed-off-by: Peter Lieven > >> --- > >> V2->V3: > >> - check rbd_flags every time (they can change during runtime) [Ilya] > >> - also check for fast-diff invalid flag [Ilya] > >> - *map and *file cant be NULL [Ilya] > >> - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an > >>unallocated area [Ilya] > >> - typo: catched -> caught [Ilya] > >> - changed wording about fast-diff, object-map and exclusive lock in > >>commit msg [Ilya] > >> > >> V1->V2: > >> - add commit comment [Stefano] > >> - use failed_post_open [Stefano] > >> - remove redundant assert [Stefano] > >> - add macro+comment for the magic -9000 value [Stefano] > >> - always set *file if its non NULL [Stefano] > >> > >> block/rbd.c | 126 > >> 1 file changed, 126 insertions(+) > >> > >> diff --git a/block/rbd.c b/block/rbd.c > >> index dcf82b15b8..3cb24f9981 100644 > >> --- a/block/rbd.c > >> +++ b/block/rbd.c > >> @@ -1259,6 +1259,131 @@ static ImageInfoSpecific > >> *qemu_rbd_get_specific_info(BlockDriverState *bs, > >> return spec_info; > >> } > >> > >> +typedef struct rbd_diff_req { > >> +uint64_t offs; > >> +uint64_t bytes; > >> +int exists; > > Hi Peter, > > > > Nit: make exists a bool. The one in the callback has to be an int > > because of the callback signature but let's not spread that. > > > >> +} rbd_diff_req; > >> + > >> +/* > >> + * rbd_diff_iterate2 allows to interrupt the exection by returning a > >> negative > >> + * value in the callback routine. Choose a value that does not conflict > >> with > >> + * an existing exitcode and return it if we want to prematurely stop the > >> + * execution because we detected a change in the allocation status. > >> + */ > >> +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 > >> + > >> +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len, > >> + int exists, void *opaque) > >> +{ > >> +struct rbd_diff_req *req = opaque; > >> + > >> +assert(req->offs + req->bytes <= offs); > >> + > >> +if (req->exists && offs > req->offs + req->bytes) { > >> +/* > >> + * we started in an allocated area and jumped over an unallocated > >> area, > >> + * req->bytes contains the length of the allocated area before the > >> + * unallocated area. stop further processing. > >> + */ > >> +return QEMU_RBD_EXIT_DIFF_ITERATE2; > >> +} > >> +if (req->exists && !exists) { > >> +/* > >> + * we started in an allocated area and reached a hole. req->bytes > >> + * contains the length of the allocated area before the hole. > >> + * stop further processing. > >> + */ > >> +return QEMU_RBD_EXIT_DIFF_ITERATE2; > > Do you have a test case for when this branch is taken? > > > That would happen if you diff from a snapshot, the question is if it can also > happen if the image is a clone from a snapshot? > > > > > >> +} > >> +if (!req->exists && exists && offs > req->offs) { > >> +/* > >> + * we started in an unallocated area and hit the first allocated > >> + * block. req->bytes must be set to the length of the unallocated > >> area > >> + * before the allocated area. stop further processing. > >> + */ > >> +req->bytes = offs - req->offs; > >> +return QEMU_RBD_EXIT_DIFF_ITERATE2; > >> +} > >> + > >> +/* > >> + * assert that we caught all cases above and allocation state has not > >> + * changed during callbacks. > >> + */ > >> +assert(exists == req->exists || !req->bytes); > >> +req->exists = exists; > >> + > >> +/* > >> + * assert that we either return an unallocated block or have got > >> callbacks > >> + * for all allocated blocks present. > >> + */ > >> +assert(!req->exists || offs == req->offs + req->bytes); > >> +req->bytes = offs + len - req->offs; > >> + > >> +return 0; > >> +} > >> + > >> +static int cor
[PATCH v2 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b
This series upgrades the ACPI IORT table up to the E.b specification revision. One of the goal of this upgrade is to allow the addition of RMR nodes along with the SMMUv3. It applies on top of Igor's [PATCH v4 00/35] acpi: refactor error prone build_header() and packed structures usage in ACPI tables The latest IORT specification (ARM DEN 0049E.b) can be found at IO Remapping Table - Platform Design Document https://developer.arm.com/documentation/den0049/latest/ This series and its dependency can be found at https://github.com/eauger/qemu.git branch: igor_acpi_refactoring_v4_dbg2_v3_rmr_v2 History: v1 -> v2: - fix Revision value in ITS and SMMUv3 nodes (Phil) - Increment an identifier (Phil) Eric Auger (3): tests/acpi: Get prepared for IORT E.b revision upgrade hw/arm/virt-acpi-build: IORT upgrade up to revision E.b tests/acpi: Generate reference blob for IORT rev E.b hw/arm/virt-acpi-build.c | 48 ++ tests/data/acpi/virt/IORT | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.memhp | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.numamem | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.pxb | Bin 124 -> 128 bytes 5 files changed, 29 insertions(+), 19 deletions(-) -- 2.26.3
[PATCH v2 2/3] hw/arm/virt-acpi-build: IORT upgrade up to revision E.b
Upgrade the IORT table from B to E.b specification revision (ARM DEN 0049E.b). Signed-off-by: Eric Auger --- v1 -> v2: - Fix Revision value for ITS node and SMMUv3 node - increment an identifier --- hw/arm/virt-acpi-build.c | 48 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 257d0fee17..789bac3134 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -241,19 +241,20 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms) #endif #define ID_MAPPING_ENTRY_SIZE 20 -#define SMMU_V3_ENTRY_SIZE 60 -#define ROOT_COMPLEX_ENTRY_SIZE 32 +#define SMMU_V3_ENTRY_SIZE 68 +#define ROOT_COMPLEX_ENTRY_SIZE 36 #define IORT_NODE_OFFSET 48 static void build_iort_id_mapping(GArray *table_data, uint32_t input_base, uint32_t id_count, uint32_t out_ref) { -/* Identity RID mapping covering the whole input RID range */ +/* Table 4 ID mapping format */ build_append_int_noprefix(table_data, input_base, 4); /* Input base */ build_append_int_noprefix(table_data, id_count, 4); /* Number of IDs */ build_append_int_noprefix(table_data, input_base, 4); /* Output base */ build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */ -build_append_int_noprefix(table_data, 0, 4); /* Flags */ +/* Flags */ +build_append_int_noprefix(table_data, 0 /* Single mapping */, 4); } struct AcpiIortIdMapping { @@ -298,7 +299,7 @@ static int iort_idmap_compare(gconstpointer a, gconstpointer b) /* * Input Output Remapping Table (IORT) * Conforms to "IO Remapping Table System Software on ARM Platforms", - * Document number: ARM DEN 0049B, October 2015 + * Document number: ARM DEN 0049E, Feb 2021 */ static void build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -307,10 +308,11 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const uint32_t iort_node_offset = IORT_NODE_OFFSET; size_t node_size, smmu_offset = 0; AcpiIortIdMapping *idmap; +uint32_t id = 0; GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping)); GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping)); -AcpiTable table = { .sig = "IORT", .rev = 0, .oem_id = vms->oem_id, +AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id, .oem_table_id = vms->oem_table_id }; /* Table 2 The IORT */ acpi_table_begin(&table, table_data); @@ -358,12 +360,12 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) build_append_int_noprefix(table_data, IORT_NODE_OFFSET, 4); build_append_int_noprefix(table_data, 0, 4); /* Reserved */ -/* 3.1.1.3 ITS group node */ +/* Table 12 ITS Group Format */ build_append_int_noprefix(table_data, 0 /* ITS Group */, 1); /* Type */ node_size = 20 /* fixed header size */ + 4 /* 1 GIC ITS Identifier */; build_append_int_noprefix(table_data, node_size, 2); /* Length */ -build_append_int_noprefix(table_data, 0, 1); /* Revision */ -build_append_int_noprefix(table_data, 0, 4); /* Reserved */ +build_append_int_noprefix(table_data, 1, 1); /* Revision */ +build_append_int_noprefix(table_data, id++, 4); /* Identifier */ build_append_int_noprefix(table_data, 0, 4); /* Number of ID mappings */ build_append_int_noprefix(table_data, 0, 4); /* Reference to ID Array */ build_append_int_noprefix(table_data, 1, 4); /* Number of ITSs */ @@ -374,19 +376,19 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) int irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE; smmu_offset = table_data->len - table.table_offset; -/* 3.1.1.2 SMMUv3 */ +/* Table 9 SMMUv3 Format */ build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */ node_size = SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE; build_append_int_noprefix(table_data, node_size, 2); /* Length */ -build_append_int_noprefix(table_data, 0, 1); /* Revision */ -build_append_int_noprefix(table_data, 0, 4); /* Reserved */ +build_append_int_noprefix(table_data, 4, 1); /* Revision */ +build_append_int_noprefix(table_data, id++, 4); /* Identifier */ build_append_int_noprefix(table_data, 1, 4); /* Number of ID mappings */ /* Reference to ID Array */ build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4); /* Base address */ build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8); /* Flags */ -build_append_int_noprefix(table_data, 1 /* COHACC OverrideNote */, 4); +build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4); build_append_int_noprefix(table_data, 0, 4); /* Reserved */ build_append_int_noprefix(table_da
Re: [PATCH V3] block/rbd: implement bdrv_co_block_status
Am 05.10.21 um 10:36 schrieb Ilya Dryomov: On Tue, Oct 5, 2021 at 10:19 AM Peter Lieven wrote: Am 05.10.21 um 09:54 schrieb Ilya Dryomov: On Thu, Sep 16, 2021 at 2:21 PM Peter Lieven wrote: the qemu rbd driver currently lacks support for bdrv_co_block_status. This results mainly in incorrect progress during block operations (e.g. qemu-img convert with an rbd image as source). This patch utilizes the rbd_diff_iterate2 call from librbd to detect allocated and unallocated (all zero areas). To avoid querying the ceph OSDs for the answer this is only done if the image has the fast-diff feature which depends on the object-map and exclusive-lock features. In this case it is guaranteed that the information is present in memory in the librbd client and thus very fast. If fast-diff is not available all areas are reported to be allocated which is the current behaviour if bdrv_co_block_status is not implemented. Signed-off-by: Peter Lieven --- V2->V3: - check rbd_flags every time (they can change during runtime) [Ilya] - also check for fast-diff invalid flag [Ilya] - *map and *file cant be NULL [Ilya] - set ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID in case of an unallocated area [Ilya] - typo: catched -> caught [Ilya] - changed wording about fast-diff, object-map and exclusive lock in commit msg [Ilya] V1->V2: - add commit comment [Stefano] - use failed_post_open [Stefano] - remove redundant assert [Stefano] - add macro+comment for the magic -9000 value [Stefano] - always set *file if its non NULL [Stefano] block/rbd.c | 126 1 file changed, 126 insertions(+) diff --git a/block/rbd.c b/block/rbd.c index dcf82b15b8..3cb24f9981 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -1259,6 +1259,131 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs, return spec_info; } +typedef struct rbd_diff_req { +uint64_t offs; +uint64_t bytes; +int exists; Hi Peter, Nit: make exists a bool. The one in the callback has to be an int because of the callback signature but let's not spread that. +} rbd_diff_req; + +/* + * rbd_diff_iterate2 allows to interrupt the exection by returning a negative + * value in the callback routine. Choose a value that does not conflict with + * an existing exitcode and return it if we want to prematurely stop the + * execution because we detected a change in the allocation status. + */ +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 + +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len, + int exists, void *opaque) +{ +struct rbd_diff_req *req = opaque; + +assert(req->offs + req->bytes <= offs); + +if (req->exists && offs > req->offs + req->bytes) { +/* + * we started in an allocated area and jumped over an unallocated area, + * req->bytes contains the length of the allocated area before the + * unallocated area. stop further processing. + */ +return QEMU_RBD_EXIT_DIFF_ITERATE2; +} +if (req->exists && !exists) { +/* + * we started in an allocated area and reached a hole. req->bytes + * contains the length of the allocated area before the hole. + * stop further processing. + */ +return QEMU_RBD_EXIT_DIFF_ITERATE2; Do you have a test case for when this branch is taken? That would happen if you diff from a snapshot, the question is if it can also happen if the image is a clone from a snapshot? +} +if (!req->exists && exists && offs > req->offs) { +/* + * we started in an unallocated area and hit the first allocated + * block. req->bytes must be set to the length of the unallocated area + * before the allocated area. stop further processing. + */ +req->bytes = offs - req->offs; +return QEMU_RBD_EXIT_DIFF_ITERATE2; +} + +/* + * assert that we caught all cases above and allocation state has not + * changed during callbacks. + */ +assert(exists == req->exists || !req->bytes); +req->exists = exists; + +/* + * assert that we either return an unallocated block or have got callbacks + * for all allocated blocks present. + */ +assert(!req->exists || offs == req->offs + req->bytes); +req->bytes = offs + len - req->offs; + +return 0; +} + +static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, + bool want_zero, int64_t offset, + int64_t bytes, int64_t *pnum, + int64_t *map, + BlockDriverState **file) +{ +BDRVRBDState *s = bs->opaque; +int ret, r; Nit: I would rename ret to status or something like that to make it clear(er) that it is an actual value and never an error. Or, even better, dro
Re: [PATCH] gitlab: Escape git-describe match pattern on Windows hosts
On Tue, Oct 05, 2021 at 10:40:00AM +0200, Cédric Le Goater wrote: > > I'm curious if you go to > > > >https://gitlab.com/legoater/qemu/-/settings/ci_cd > > > > and expand "General pipelines", what value is set for the > > > >"Git shallow clone" > > > > setting. In my fork it is 0 which means unlimited depth, but in > > gitlab docs I see reference to repos getting this set to 50 > > since a particular gitlab release. > > Sorry for the late reply. > > Setting the value to 0 fixed the windows build on gitlab. Ok, so we've got two options - Change the code so it has sane fallback if the tags are all missing - Set GIT_DEPTH in the affected jobs to a value that is larger than the maximum number of commits we expect in the course of a single dev cycle, plus 20% grace on top, so that we're guaranteed enough history to describe one tag. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH v2 1/3] tests/acpi: Get prepared for IORT E.b revision upgrade
Ignore IORT till reference blob for E.b spec revision gets added. Signed-off-by: Eric Auger --- tests/qtest/bios-tables-test-allowed-diff.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h index dfb8523c8b..9a5a923d6b 100644 --- a/tests/qtest/bios-tables-test-allowed-diff.h +++ b/tests/qtest/bios-tables-test-allowed-diff.h @@ -1 +1,2 @@ /* List of comma-separated changed AML files to ignore */ +"tests/data/acpi/virt/IORT", -- 2.26.3
Re: [PATCH] gitlab: Escape git-describe match pattern on Windows hosts
I'm curious if you go to https://gitlab.com/legoater/qemu/-/settings/ci_cd and expand "General pipelines", what value is set for the "Git shallow clone" setting. In my fork it is 0 which means unlimited depth, but in gitlab docs I see reference to repos getting this set to 50 since a particular gitlab release. Sorry for the late reply. Setting the value to 0 fixed the windows build on gitlab. Thanks, C.
[PATCH v2 3/3] tests/acpi: Generate reference blob for IORT rev E.b
Re-generate reference blobs with rebuild-expected-aml.sh. Signed-off-by: Eric Auger --- tests/qtest/bios-tables-test-allowed-diff.h | 1 - tests/data/acpi/virt/IORT | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.memhp | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.numamem | Bin 124 -> 128 bytes tests/data/acpi/virt/IORT.pxb | Bin 124 -> 128 bytes 5 files changed, 1 deletion(-) diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h index 9a5a923d6b..dfb8523c8b 100644 --- a/tests/qtest/bios-tables-test-allowed-diff.h +++ b/tests/qtest/bios-tables-test-allowed-diff.h @@ -1,2 +1 @@ /* List of comma-separated changed AML files to ignore */ -"tests/data/acpi/virt/IORT", diff --git a/tests/data/acpi/virt/IORT b/tests/data/acpi/virt/IORT index 521acefe9ba66706c5607321a82d330586f3f280..7efd0ce8a6b3928efa7e1373f688ab4c5f50543b 100644 GIT binary patch literal 128 zcmebD4+?2uU|?Y0?Bwt45v<@85#X!<1dKp25F11@0kHuPgMkDCNC*yK93~3}W)K^M VRiHGGVg_O`aDdYP|3ers^8jQz3IPBB literal 124 zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0# QRGb+i3L*dhhtM#y0PN=p0RR91 diff --git a/tests/data/acpi/virt/IORT.memhp b/tests/data/acpi/virt/IORT.memhp index 521acefe9ba66706c5607321a82d330586f3f280..7efd0ce8a6b3928efa7e1373f688ab4c5f50543b 100644 GIT binary patch literal 128 zcmebD4+?2uU|?Y0?Bwt45v<@85#X!<1dKp25F11@0kHuPgMkDCNC*yK93~3}W)K^M VRiHGGVg_O`aDdYP|3ers^8jQz3IPBB literal 124 zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0# QRGb+i3L*dhhtM#y0PN=p0RR91 diff --git a/tests/data/acpi/virt/IORT.numamem b/tests/data/acpi/virt/IORT.numamem index 521acefe9ba66706c5607321a82d330586f3f280..7efd0ce8a6b3928efa7e1373f688ab4c5f50543b 100644 GIT binary patch literal 128 zcmebD4+?2uU|?Y0?Bwt45v<@85#X!<1dKp25F11@0kHuPgMkDCNC*yK93~3}W)K^M VRiHGGVg_O`aDdYP|3ers^8jQz3IPBB literal 124 zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0# QRGb+i3L*dhhtM#y0PN=p0RR91 diff --git a/tests/data/acpi/virt/IORT.pxb b/tests/data/acpi/virt/IORT.pxb index 521acefe9ba66706c5607321a82d330586f3f280..7efd0ce8a6b3928efa7e1373f688ab4c5f50543b 100644 GIT binary patch literal 128 zcmebD4+?2uU|?Y0?Bwt45v<@85#X!<1dKp25F11@0kHuPgMkDCNC*yK93~3}W)K^M VRiHGGVg_O`aDdYP|3ers^8jQz3IPBB literal 124 zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0# QRGb+i3L*dhhtM#y0PN=p0RR91 -- 2.26.3
Re: Deprecate the ppc405 boards in QEMU?
On 05/10/2021 10.07, Thomas Huth wrote: On 05/10/2021 10.05, Alexey Kardashevskiy wrote: [...] What is so special about taihu? taihu is the other 405 board defined in hw/ppc/ppc405_boards.c (which I suggested to deprecate now) I've now also played with the u-boot sources a little bit, and with some bit of tweaking, it's indeed possible to compile the old taihu board there. However, it does not really work with QEMU anymore, it immediately triggers an assert(): $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio ** ERROR:accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: (qemu_mutex_iothread_locked()) Aborted (core dumped) Going back to QEMU v2.3.0, I can see at least a little bit of output, but it then also triggers an assert() during DRAM initialization: $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio Reset PowerPC core U-Boot 2014.10-rc2-00123-g461be2f96e-dirty (Oct 05 2021 - 10:02:56) CPU: AMCC PowerPC 405EP Rev. B at 770 MHz (PLB=256 OPB=128 EBC=128) I2C boot EEPROM disabled Internal PCI arbiter enabled 16 KiB I-Cache 16 KiB D-Cache Board: Taihu - AMCC PPC405EP Evaluation Board I2C: ready DRAM: qemu-system-ppc: memory.c:1693: memory_region_del_subregion: Assertion `subregion->container == mr' failed. Aborted (core dumped) Not sure if this ever worked in QEMU, maybe in the early 0.15 time, but that version of QEMU also does not compile easily anymore on modern systems. So I'm afraid, getting this into a workable shape again will take a lot of time. At least I'll stop my efforts here now. Thomas
[RFC v2 1/2] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5
Add a 'preserve_config' field in struct GPEXConfig and if set generate the DSM #5 for preserving PCI boot configurations. The DSM presence is needed to expose RMRs. At the moment the DSM generation is not yet enabled. Signed-off-by: Eric Auger --- include/hw/pci-host/gpex.h | 1 + hw/pci-host/gpex-acpi.c| 12 2 files changed, 13 insertions(+) diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h index fcf8b63820..3f8f8ec38d 100644 --- a/include/hw/pci-host/gpex.h +++ b/include/hw/pci-host/gpex.h @@ -64,6 +64,7 @@ struct GPEXConfig { MemMapEntry pio; int irq; PCIBus *bus; +boolpreserve_config; }; int gpex_set_irq_num(GPEXHost *s, int index, int gsi); diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c index e7e162a00a..7dab259379 100644 --- a/hw/pci-host/gpex-acpi.c +++ b/hw/pci-host/gpex-acpi.c @@ -164,6 +164,12 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg) aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node))); } +if (cfg->preserve_config) { +method = aml_method("_DSM", 5, AML_SERIALIZED); +aml_append(method, aml_return(aml_int(0))); +aml_append(dev, method); +} + acpi_dsdt_add_pci_route_table(dev, cfg->irq); /* @@ -191,6 +197,12 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg) aml_append(dev, aml_name_decl("_STR", aml_unicode("PCIe 0 Device"))); aml_append(dev, aml_name_decl("_CCA", aml_int(1))); +if (cfg->preserve_config) { +method = aml_method("_DSM", 5, AML_SERIALIZED); +aml_append(method, aml_return(aml_int(0))); +aml_append(dev, method); +} + acpi_dsdt_add_pci_route_table(dev, cfg->irq); method = aml_method("_CBA", 0, AML_NOTSERIALIZED); -- 2.26.3
Re: [PATCH 2/4] aspeed/smc: Dump address offset in trace events
On [2021 Oct 04] Mon 17:46:33, Cédric Le Goater wrote: > The register index is currently printed and this is confusing. > > Signed-off-by: Cédric Le Goater Reviewed-by: Francisco Iglesias > --- > hw/ssi/aspeed_smc.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c > index 7129341c129e..8a988c167604 100644 > --- a/hw/ssi/aspeed_smc.c > +++ b/hw/ssi/aspeed_smc.c > @@ -728,7 +728,7 @@ static uint64_t aspeed_smc_read(void *opaque, hwaddr > addr, unsigned int size) > addr < R_SEG_ADDR0 + asc->max_peripherals) || > (addr >= s->r_ctrl0 && addr < s->r_ctrl0 + asc->max_peripherals)) { > > -trace_aspeed_smc_read(addr, size, s->regs[addr]); > +trace_aspeed_smc_read(addr << 2, size, s->regs[addr]); > > return s->regs[addr]; > } else { > @@ -1029,10 +1029,10 @@ static void aspeed_smc_write(void *opaque, hwaddr > addr, uint64_t data, > AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s); > uint32_t value = data; > > -addr >>= 2; > - > trace_aspeed_smc_write(addr, size, data); > > +addr >>= 2; > + > if (addr == s->r_conf || > (addr >= s->r_timings && > addr < s->r_timings + asc->nregs_timings) || > -- > 2.31.1 > >
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On Tue, Oct 05, 2021 at 06:44:23AM +0200, Christophe Leroy wrote: > > > Le 05/10/2021 à 02:48, David Gibson a écrit : > > On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: > > > On 01/10/2021 15.04, Christophe Leroy wrote: > > > > > > > > > > > > Le 01/10/2021 à 14:04, Thomas Huth a écrit : > > > > > On 01/10/2021 13.12, Peter Maydell wrote: > > > > > > On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: > > > > > > > Nevertheless, as long as nobody has a hint where to find that > > > > > > > ppc405_rom.bin, I think both boards are pretty useless in QEMU > > > > > > > (as far as I > > > > > > > can see, they do not work without the bios at all, so it's > > > > > > > also not possible > > > > > > > to use a Linux image with the "-kernel" CLI option directly). > > > > > > > > > > > > It is at least in theory possible to run bare-metal code on > > > > > > either board, by passing either a pflash or a bios argument. > > > > > > > > > > True. I did some more research, and seems like there was once > > > > > support for those boards in u-boot, but it got removed there a > > > > > couple of years ago already: > > > > > > > > > > https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf > > > > > > > > > > https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b > > > > > > > > > > https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 > > > > > > > > > > > But I agree that there seem to be no signs of anybody actually > > > > > > successfully using these boards for anything, so we should > > > > > > deprecate-and-delete them. > > > > > > > > > > Yes, let's mark them as deprecated now ... if someone still uses > > > > > them and speaks up, we can still revert the deprecation again. > > > > > > > > I really would like to be able to use them to validate Linux Kernel > > > > changes, hence looking for that missing BIOS. > > > > > > > > If we remove ppc405 from QEMU, we won't be able to do any regression > > > > tests of Linux Kernel on those processors. > > > > > > If you/someone managed to compile an old version of u-boot for one of > > > these > > > two boards, so that we would finally have something for regression > > > testing, > > > we can of course also keep the boards in QEMU... > > > > I can see that it would be usefor for some cases, but unless someone > > volunteers to track down the necessary firmware and look after it, I > > think we do need to deprecate it - I certainly don't have the capacity > > to look into this. > > > > I will look at it, please allow me a few weeks though. Once something is deprecated, it remains in QEMU for a minimum of two release cycles, before being deleted. At any time in that deprecation period it can be returned to supported status, if someone provides a good enough justification to keep it. IOW, we can deprecate this now, and you still have plenty of time to investigate more. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[RFC v2 0/2] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
To handle SMMUv3 nested stage support it is practical to expose the guest with reserved memory regions (RMRs) covering the IOVAs used by the host kernel to map physical MSI doorbells. Those IOVAs belong to [0x800, 0x810] matching MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel arm-smmu-v3 driver. This is the window used to allocate IOVAs matching physical MSI doorbells. With those RMRs, the guest is forced to use a flat mapping for this range. Hence the assigned device is programmed with one IOVA from this range. Stage 1, owned by the guest has a flat mapping for this IOVA. Stage2, owned by the VMM then enforces a mapping from this IOVA to the physical MSI doorbell. At IORT table level, due to the single mapping flag being set on the ID mapping, 256 IORT RMR nodes need to be created per bus. This looks awkward from a specification and implementation point of view. This may also produce a warning at execution time: qemu-system-aarch64: warning: ACPI table size 114709 exceeds 65536 bytes, migration may not work (here with 5 pcie root ports, ie. 256 * 6 = 1536 RMR nodes!). The creation of those RMR nodes only is relevant if nested stage SMMU is in use, along with VFIO. As VFIO devices can be hotplugged, all RMRs need to be created in advance. Hence the patch introduces a new arm virt "nested-smmuv3" iommu type. ARM DEN 0049E.b IORT specification also mandates that when RMRs are present, the OS must preserve PCIe configuration performed by the boot FW. So along with the RMR IORT nodes, a _DSM function #5, as defined by PCI FIRMWARE SPECIFICATION EVISION 3.3, chapter 4.6.5 is added to PCIe host bridge and PCIe expander bridge objects. The series applies on top of Igor's [1] [PATCH v4 00/35] acpi: refactor error prone build_header() and packed structures usage in ACPI tables and [2] [PATCH v2 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b The guest can use RMRs with Shameer's series: [3] [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node The latest IORT specification (ARM DEN 0049E.b) can be found at IO Remapping Table - Platform Design Document https://developer.arm.com/documentation/den0049/latest/ This series and its dependency can be found at https://github.com/eauger/qemu.git branch: igor_acpi_refactoring_v4_dbg2_v3_rmr_v2 History: v1 -> v2: - add DSM #5 Eric Auger (2): hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding include/hw/arm/virt.h | 7 include/hw/pci-host/gpex.h | 1 + hw/arm/virt-acpi-build.c | 84 -- hw/arm/virt.c | 7 +++- hw/pci-host/gpex-acpi.c| 12 ++ 5 files changed, 98 insertions(+), 13 deletions(-) -- 2.26.3
[RFC v2 2/2] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
To handle SMMUv3 nested stage support it is practical to expose the guest with reserved memory regions (RMRs) covering the IOVAs used by the host kernel to map physical MSI doorbells. Those IOVAs belong to [0x800, 0x810] matching MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel arm-smmu-v3 driver. This is the window used to allocate IOVAs matching physical MSI doorbells. With those RMRs, the guest is forced to use a flat mapping for this range. Hence the assigned device is programmed with one IOVA from this range. Stage 1, owned by the guest has a flat mapping for this IOVA. Stage2, owned by the VMM then enforces a mapping from this IOVA to the physical MSI doorbell. At IORT table level, due to the single mapping flag being set on the ID mapping, 256 IORT RMR nodes need to be created per bus. This looks awkward from a specification and implementation point of view. This may also produce a warning at execution time: qemu-system-aarch64: warning: ACPI table size 114709 exceeds 65536 bytes, migration may not work (here with 5 pcie root ports, ie. 256 * 6 = 1536 RMR nodes!). The creation of those RMR nodes only is relevant if nested stage SMMU is in use, along with VFIO. As VFIO devices can be hotplugged, all RMRs need to be created in advance. Hence the patch introduces a new arm virt "nested-smmuv3" iommu type. ARM DEN 0049E.b IORT specification also mandates that when RMRs are present, the OS must preserve PCIe configuration performed by the boot FW. So along with the RMR IORT nodes, a _DSM function #5, as defined by PCI FIRMWARE SPECIFICATION EVISION 3.3, chapter 4.6.5 is added to PCIe host bridge and PCIe expander bridge objects. Signed-off-by: Eric Auger Suggested-by: Jean-Philippe Brucker --- v1 -> v2: - add DSM #5 - use identifier increment Instead of introducing a new IOMMU type, we could introduce an array of qdev_prop_reserved_region(s). Guest can parse the IORT RMR nodes with Shammer's series: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node The patch applies on Igor's v4 series [1]+ IORT E.b upgrade [2] [1] [PATCH v4 00/35] acpi: refactor error prone build_header() and packed structures usage in ACPI tables [2] [PATCH 0/3] hw/arm/virt_acpi_build: Upgrate the IORT table up to revision E.b --- include/hw/arm/virt.h| 7 hw/arm/virt-acpi-build.c | 84 ++-- hw/arm/virt.c| 7 +++- 3 files changed, 85 insertions(+), 13 deletions(-) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index b461b8d261..f2f8aee219 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -99,6 +99,7 @@ enum { typedef enum VirtIOMMUType { VIRT_IOMMU_NONE, VIRT_IOMMU_SMMUV3, +VIRT_IOMMU_NESTED_SMMUV3, VIRT_IOMMU_VIRTIO, } VirtIOMMUType; @@ -190,4 +191,10 @@ static inline int virt_gicv3_redist_region_count(VirtMachineState *vms) return MACHINE(vms)->smp.cpus > redist0_capacity ? 2 : 1; } +static inline bool virt_has_smmuv3(const VirtMachineState *vms) +{ +return vms->iommu == VIRT_IOMMU_SMMUV3 || + vms->iommu == VIRT_IOMMU_NESTED_SMMUV3; +} + #endif /* QEMU_ARM_VIRT_H */ diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 789bac3134..7260e47c83 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -169,6 +169,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, .bus= vms->bus, }; +/* + * Nested SMMU requires RMRs for MSI 1-1 mapping, which + * require _DSM for PreservingPCI Boot Configurations + */ +if (vms->iommu == VIRT_IOMMU_NESTED_SMMUV3) { +cfg.preserve_config = true; +} + if (use_highmem) { cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO]; } @@ -245,16 +253,16 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms) #define ROOT_COMPLEX_ENTRY_SIZE 36 #define IORT_NODE_OFFSET 48 -static void build_iort_id_mapping(GArray *table_data, uint32_t input_base, - uint32_t id_count, uint32_t out_ref) +static void +build_iort_id_mapping(GArray *table_data, uint32_t input_base, + uint32_t id_count, uint32_t out_ref, uint32_t flags) { /* Table 4 ID mapping format */ build_append_int_noprefix(table_data, input_base, 4); /* Input base */ build_append_int_noprefix(table_data, id_count, 4); /* Number of IDs */ build_append_int_noprefix(table_data, input_base, 4); /* Output base */ build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */ -/* Flags */ -build_append_int_noprefix(table_data, 0 /* Single mapping */, 4); +build_append_int_noprefix(table_data, flags, 4); /* Flags */ } struct AcpiIortIdMapping { @@ -296,6 +304,50 @@ static int iort_idmap_compare(gconstpointer a, gconstpointer b) return idmap_a->input_base - idmap_b->input_base; } +static void +build_iort_rmr_nodes(GArray *table_data, GArray *smmu_idmaps, int smmu
Re: [PATCH 1/4] aspeed/wdt: Add trace events
On [2021 Oct 04] Mon 17:46:32, Cédric Le Goater wrote: > Signed-off-by: Cédric Le Goater Reviewed-by: Francisco Iglesias > --- > hw/watchdog/wdt_aspeed.c | 5 + > hw/watchdog/trace-events | 4 > 2 files changed, 9 insertions(+) > > diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c > index 69c37af9a6e9..146ffcd71301 100644 > --- a/hw/watchdog/wdt_aspeed.c > +++ b/hw/watchdog/wdt_aspeed.c > @@ -19,6 +19,7 @@ > #include "hw/sysbus.h" > #include "hw/watchdog/wdt_aspeed.h" > #include "migration/vmstate.h" > +#include "trace.h" > > #define WDT_STATUS (0x00 / 4) > #define WDT_RELOAD_VALUE(0x04 / 4) > @@ -60,6 +61,8 @@ static uint64_t aspeed_wdt_read(void *opaque, hwaddr > offset, unsigned size) > { > AspeedWDTState *s = ASPEED_WDT(opaque); > > +trace_aspeed_wdt_read(offset, size); > + > offset >>= 2; > > switch (offset) { > @@ -140,6 +143,8 @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, > uint64_t data, > AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(s); > bool enable; > > +trace_aspeed_wdt_write(offset, size, data); > + > offset >>= 2; > > switch (offset) { > diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events > index c3bafbffa911..e7523e22aaf2 100644 > --- a/hw/watchdog/trace-events > +++ b/hw/watchdog/trace-events > @@ -5,3 +5,7 @@ cmsdk_apb_watchdog_read(uint64_t offset, uint64_t data, > unsigned size) "CMSDK AP > cmsdk_apb_watchdog_write(uint64_t offset, uint64_t data, unsigned size) > "CMSDK APB watchdog write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" > cmsdk_apb_watchdog_reset(void) "CMSDK APB watchdog: reset" > cmsdk_apb_watchdog_lock(uint32_t lock) "CMSDK APB watchdog: lock %" PRIu32 > + > +# wdt-aspeed.c > +aspeed_wdt_read(uint64_t addr, uint32_t size) "@0x%" PRIx64 " size=%d" > +aspeed_wdt_write(uint64_t addr, uint32_t size, uint64_t data) "@0x%" PRIx64 > " size=%d value=0x%"PRIx64 > -- > 2.31.1 > >
Re: [PATCH v4 05/11] hw/arm/virt: Use object_property_set instead of qdev_prop_set
On 10/1/21 7:33 PM, Jean-Philippe Brucker wrote: > To propagate errors to the caller of the pre_plug callback, use the > object_poperty_set*() functions directly instead of the qdev_prop_set*() > helpers. > > Suggested-by: Igor Mammedov > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Eric > --- > hw/arm/virt.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 36f0261ef4..ac307b6030 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2465,8 +2465,9 @@ static void > virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, > db_start, db_end, > VIRTIO_IOMMU_RESV_MEM_T_MSI); > > -qdev_prop_set_uint32(dev, "len-reserved-regions", 1); > -qdev_prop_set_string(dev, "reserved-regions[0]", resv_prop_str); > +object_property_set_uint(OBJECT(dev), "len-reserved-regions", 1, > errp); > +object_property_set_str(OBJECT(dev), "reserved-regions[0]", > +resv_prop_str, errp); > g_free(resv_prop_str); > } > }
[Bug 1884169] Re: There is no option group 'fsdev' for OSX
But actually OS X (macOS) supports 9pfs and it does have its own AppleVirtIO9PVFS which makes things a bit strange, would not that be a good workaround, to use the AppleVirtIO9PVFS? All my best, Waheed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1884169 Title: There is no option group 'fsdev' for OSX Status in QEMU: Opinion Bug description: When I try to use -fsoption on OSX I receive this error: -fsdev local,security_model=mapped,id=fsdev0,path=devel/dmos-example: There is no option group 'fsdev' To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1884169/+subscriptions
Re: [PATCH] virtiofsd: xattr mapping add a new type "unsupported"
* Vivek Goyal (vgo...@redhat.com) wrote: > Right now for xattr remapping, we support types of "prefix", "ok" or "bad". > Type "bad" returns -EPERM on setxattr and hides xattr in listxattr. For > getxattr, mapping code returns -EPERM but getxattr code converts it to > -ENODATA. > > I need a new semantics where if an xattr is unsupported, then > getxattr()/setxattr() return -ENOTSUP and listxattr() should hide the xattr. > This is needed to simulate that security.selinux is not supported by > virtiofs filesystem and in that case client falls back to some default > label specified by policy. > > So add a new type "unsupported" which returns -ENOTSUP on getxattr() and > setxattr() and hides xattrs in listxattr(). > > For example, one can use following mapping rule to not support > security.selinux xattr and allow others. > > "-o xattrmap=/unsupported/all/security.selinux/security.selinux//ok/all///" > > Suggested-by: "Dr. David Alan Gilbert" > Signed-off-by: Vivek Goyal Yes, that's nice and simple. Reviewed-by: Dr. David Alan Gilbert > --- > docs/tools/virtiofsd.rst |6 ++ > tools/virtiofsd/passthrough_ll.c | 17 ++--- > 2 files changed, 20 insertions(+), 3 deletions(-) > > Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c > === > --- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c 2021-09-22 > 08:37:16.070377732 -0400 > +++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c2021-09-22 > 14:17:09.543016250 -0400 > @@ -2465,6 +2465,11 @@ static void lo_flock(fuse_req_t req, fus > * Automatically reversed on read > */ > #define XATTR_MAP_FLAG_PREFIX (1 << 2) > +/* > + * The attribute is unsupported; > + * ENOTSUP on write, hidden on read. > + */ > +#define XATTR_MAP_FLAG_UNSUPPORTED (1 << 3) > > /* scopes */ > /* Apply rule to get/set/remove */ > @@ -2636,6 +2641,8 @@ static void parse_xattrmap(struct lo_dat > tmp_entry.flags |= XATTR_MAP_FLAG_OK; > } else if (strstart(map, "bad", &map)) { > tmp_entry.flags |= XATTR_MAP_FLAG_BAD; > +} else if (strstart(map, "unsupported", &map)) { > +tmp_entry.flags |= XATTR_MAP_FLAG_UNSUPPORTED; > } else if (strstart(map, "map", &map)) { > /* > * map is sugar that adds a number of rules, and must be > @@ -2646,8 +2653,8 @@ static void parse_xattrmap(struct lo_dat > } else { > fuse_log(FUSE_LOG_ERR, > "%s: Unexpected type;" > - "Expecting 'prefix', 'ok', 'bad' or 'map' in rule > %zu\n", > - __func__, lo->xattr_map_nentries); > + "Expecting 'prefix', 'ok', 'bad', 'unsupported' or > 'map'" > + " in rule %zu\n", __func__, lo->xattr_map_nentries); > exit(1); > } > > @@ -2749,6 +2756,9 @@ static int xattr_map_client(const struct > if (cur_entry->flags & XATTR_MAP_FLAG_BAD) { > return -EPERM; > } > +if (cur_entry->flags & XATTR_MAP_FLAG_UNSUPPORTED) { > +return -ENOTSUP; > +} > if (cur_entry->flags & XATTR_MAP_FLAG_OK) { > /* Unmodified name */ > return 0; > @@ -2788,7 +2798,8 @@ static int xattr_map_server(const struct > > if ((cur_entry->flags & XATTR_MAP_FLAG_SERVER) && > (strstart(server_name, cur_entry->prepend, &end))) { > -if (cur_entry->flags & XATTR_MAP_FLAG_BAD) { > +if (cur_entry->flags & XATTR_MAP_FLAG_BAD || > +cur_entry->flags & XATTR_MAP_FLAG_UNSUPPORTED) { > return -ENODATA; > } > if (cur_entry->flags & XATTR_MAP_FLAG_OK) { > Index: rhvgoyal-qemu/docs/tools/virtiofsd.rst > === > --- rhvgoyal-qemu.orig/docs/tools/virtiofsd.rst 2021-09-22 > 08:37:15.938372097 -0400 > +++ rhvgoyal-qemu/docs/tools/virtiofsd.rst2021-09-22 14:44:09.814188712 > -0400 > @@ -183,6 +183,12 @@ Using ':' as the separator a rule is of >'ok' as either an explicit terminator or for special handling of certain >patterns. > > +- 'unsupported' - If a client tries to use a name matching 'key' it's > + denied using ENOTSUP; when the server passes an attribute > + name matching 'prepend' it's hidden. In many ways it's use is very like > + 'ok' as either an explicit terminator or for special handling of certain > + patterns. > + > **key** is a string tested as a prefix on an attribute name originating > on the client. It maybe empty in which case a 'client' rule > will always match on client names. > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v2 08/12] macfb: add common monitor modes supported by the MacOS toolbox ROM
Le 04/10/2021 à 23:19, Mark Cave-Ayland a écrit : > The monitor modes table is found by experimenting with the Monitors Control > Panel in MacOS and analysing the reads/writes. From this it can be found that > the mode is controlled by writes to the DAFB_MODE_CTRL1 and DAFB_MODE_CTRL2 > registers. > > Implement the first block of DAFB registers as a register array including the > existing sense register, the newly discovered control registers above, and > also > the DAFB_MODE_VADDR1 and DAFB_MODE_VADDR2 registers which are used by NetBSD > to > determine the current video mode. > > These experiments also show that the offset of the start of video RAM and the > stride can change depending upon the monitor mode, so update > macfb_draw_graphic() > and both the BI_MAC_VADDR and BI_MAC_VROW bootinfo for the q800 machine > accordingly. > > Finally update macfb_common_realize() so that only the resolution and depth > supported by the display type can be specified on the command line. > > Signed-off-by: Mark Cave-Ayland > Reviewed-by: Laurent Vivier > --- > hw/display/macfb.c | 124 - > hw/display/trace-events| 1 + > hw/m68k/q800.c | 11 ++-- > include/hw/display/macfb.h | 16 - > 4 files changed, 131 insertions(+), 21 deletions(-) > > diff --git a/hw/display/macfb.c b/hw/display/macfb.c > index f98bcdec2d..357fe18be5 100644 > --- a/hw/display/macfb.c > +++ b/hw/display/macfb.c > ... > +static MacFbMode *macfb_find_mode(MacfbDisplayType display_type, > + uint16_t width, uint16_t height, > + uint8_t depth) > +{ > +MacFbMode *macfb_mode; > +int i; > + > +for (i = 0; i < ARRAY_SIZE(macfb_mode_table); i++) { > +macfb_mode = &macfb_mode_table[i]; > + > +if (display_type == macfb_mode->type && width == macfb_mode->width && > +height == macfb_mode->height && depth == macfb_mode->depth) { > +return macfb_mode; > +} > +} > + > +return NULL; > +} > + I misunderstood this part when I reviewed v1... It means you have to provide the monitor type to QEMU to switch from the default mode? But, as a user, how do we know which modes are allowed with which resolution? Is possible to try to set internally the type here according to the resolution? Could you provide an command line example how to start the q800 with the 1152x870 resolution? Thanks, Laurent
Re: [PATCH 2/3] vdpa: Add vhost_vdpa_section_end
On Tue, Oct 5, 2021 at 10:15 AM Michael S. Tsirkin wrote: > > On Tue, Oct 05, 2021 at 10:01:30AM +0200, Eugenio Pérez wrote: > > Abstract this operation, that will be reused when validating the region > > against the iova range that the device supports. > > > > Signed-off-by: Eugenio Pérez > > Note that as defined end is actually 1 byte beyond end of section. > As such it can e.g. overflow if cast to u64. > So be careful to use int128 ops with it. You are right, but this is only the result of extracting "llend" calculation in its own function, since it is going to be used a third time in the next commit. This next commit contains a mistake because of this, as you pointed out. Since "last" would be a very misleading name, do you think we could give a better name / type to it? > Also - document? It will be documented with that ("It returns one byte beyond end of section" or similar) too. Thanks! > > > --- > > hw/virtio/vhost-vdpa.c | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index ea1aa71ad8..a1de6c7c9c 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -24,6 +24,15 @@ > > #include "trace.h" > > #include "qemu-common.h" > > > > +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) > > +{ > > +Int128 llend = int128_make64(section->offset_within_address_space); > > +llend = int128_add(llend, section->size); > > +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > + > > +return llend; > > +} > > + > > static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > *section) > > { > > return (!memory_region_is_ram(section->mr) && > > @@ -160,10 +169,7 @@ static void > > vhost_vdpa_listener_region_add(MemoryListener *listener, > > } > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > -llend = int128_make64(section->offset_within_address_space); > > -llend = int128_add(llend, section->size); > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > - > > +llend = vhost_vdpa_section_end(section); > > if (int128_ge(int128_make64(iova), llend)) { > > return; > > } > > @@ -221,9 +227,7 @@ static void > > vhost_vdpa_listener_region_del(MemoryListener *listener, > > } > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > -llend = int128_make64(section->offset_within_address_space); > > -llend = int128_add(llend, section->size); > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > +llend = vhost_vdpa_section_end(section); > > > > trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); > > > > -- > > 2.27.0 >
Re: [PATCH 3/3] vdpa: Check for iova range at mappings changes
On Tue, Oct 5, 2021 at 10:14 AM Michael S. Tsirkin wrote: > > On Tue, Oct 05, 2021 at 10:01:31AM +0200, Eugenio Pérez wrote: > > Check vdpa device range before updating memory regions so we don't add > > any outside of it, and report the invalid change if any. > > > > Signed-off-by: Eugenio Pérez > > --- > > include/hw/virtio/vhost-vdpa.h | 2 + > > hw/virtio/vhost-vdpa.c | 68 ++ > > hw/virtio/trace-events | 1 + > > 3 files changed, 55 insertions(+), 16 deletions(-) > > > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > > index a8963da2d9..c288cf7ecb 100644 > > --- a/include/hw/virtio/vhost-vdpa.h > > +++ b/include/hw/virtio/vhost-vdpa.h > > @@ -13,6 +13,7 @@ > > #define HW_VIRTIO_VHOST_VDPA_H > > > > #include "hw/virtio/virtio.h" > > +#include "standard-headers/linux/vhost_types.h" > > > > typedef struct VhostVDPAHostNotifier { > > MemoryRegion mr; > > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa { > > uint32_t msg_type; > > bool iotlb_batch_begin_sent; > > MemoryListener listener; > > +struct vhost_vdpa_iova_range iova_range; > > struct vhost_dev *dev; > > VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; > > } VhostVDPA; > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index a1de6c7c9c..26d0258723 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -33,20 +33,34 @@ static Int128 vhost_vdpa_section_end(const > > MemoryRegionSection *section) > > return llend; > > } > > > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > *section) > > -{ > > -return (!memory_region_is_ram(section->mr) && > > -!memory_region_is_iommu(section->mr)) || > > -memory_region_is_protected(section->mr) || > > - /* vhost-vDPA doesn't allow MMIO to be mapped */ > > -memory_region_is_ram_device(section->mr) || > > - /* > > -* Sizing an enabled 64-bit BAR can cause spurious mappings to > > -* addresses in the upper part of the 64-bit address space. > > These > > -* are never accessed by the CPU and beyond the address width of > > -* some IOMMU hardware. TODO: VDPA should tell us the IOMMU > > width. > > -*/ > > - section->offset_within_address_space & (1ULL << 63); > > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > *section, > > +uint64_t iova_min, > > +uint64_t iova_max) > > +{ > > +Int128 llend; > > +bool r = (!memory_region_is_ram(section->mr) && > > + !memory_region_is_iommu(section->mr)) || > > + memory_region_is_protected(section->mr) || > > + /* vhost-vDPA doesn't allow MMIO to be mapped */ > > + memory_region_is_ram_device(section->mr); > > +if (r) { > > +return true; > > +} > > + > > +if (section->offset_within_address_space < iova_min) { > > +error_report("RAM section out of device range (min=%lu, addr=%lu)", > > + iova_min, section->offset_within_address_space); > > +return true; > > +} > > + > > +llend = vhost_vdpa_section_end(section); > > +if (int128_make64(llend) > iova_max) { > > I am puzzled by this. > You are taking a Int128, converting to u64, converting > back to Int128, and comparing to u64. > Head spins. What is all this back and forth trying to achieve? > You are totally right, this series was extracted from a longer one where I didn't use vhost_vdpa_section_end, but raw addresses. Then I applied int128_make64 to the wrong variable, too fast. To be sure we are on the same page, to do: if (int128_ge(int128_make64(iova), llend)) { // error message return; } The same way as vhost_vdpa_listener_region_{add,del} would be ok? Thanks! > > +error_report("RAM section out of device range (max=%lu, end > > addr=%lu)", > > + iova_max, (uint64_t)int128_make64(llend)); > > +return true; > > +} > > + > > +return false; > > } > > > > static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr > > size, > > @@ -158,7 +172,8 @@ static void > > vhost_vdpa_listener_region_add(MemoryListener *listener, > > void *vaddr; > > int ret; > > > > -if (vhost_vdpa_listener_skipped_section(section)) { > > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, > > +v->iova_range.last)) { > > return; > > } > > > > @@ -216,7 +231,8 @@ static void > > vhost_vdpa_listener_region_del(MemoryListener *listener, > > Int128 llend, llsize; > > int ret; > > > > -if (vhost_vdpa_listener_skipped_section(section)) { > > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, > >
Re: [PATCH v4 10/11] tests/acpi: add expected blob for VIOT test on virt machine
On Fri, 1 Oct 2021, Jean-Philippe Brucker wrote: > The VIOT blob contains the following: > > [000h 4]Signature : "VIOT"[Virtual I/O > Translation Table] > [004h 0004 4] Table Length : 0058 > [008h 0008 1] Revision : 00 > [009h 0009 1] Checksum : 66 > [00Ah 0010 6] Oem ID : "BOCHS " > [010h 0016 8] Oem Table ID : "BXPC" > [018h 0024 4] Oem Revision : 0001 > [01Ch 0028 4] Asl Compiler ID : "BXPC" > [020h 0032 4]Asl Compiler Revision : 0001 > > [024h 0036 2] Node count : 0002 > [026h 0038 2] Node offset : 0030 > [028h 0040 8] Reserved : > > [030h 0048 1] Type : 03 [VirtIO-PCI IOMMU] > [031h 0049 1] Reserved : 00 > [032h 0050 2] Length : 0010 > > [034h 0052 2] PCI Segment : > [036h 0054 2] PCI BDF number : 0008 > [038h 0056 8] Reserved : > > [040h 0064 1] Type : 01 [PCI Range] > [041h 0065 1] Reserved : 00 > [042h 0066 2] Length : 0018 > > [044h 0068 4] Endpoint start : > [048h 0072 2]PCI Segment start : > [04Ah 0074 2] PCI Segment end : > [04Ch 0076 2]PCI BDF start : > [04Eh 0078 2] PCI BDF end : 00FF > [050h 0080 2] Output node : 0030 > [052h 0082 6] Reserved : > > Signed-off-by: Jean-Philippe Brucker Acked-by: Ani Sinha Without looking at the other patches, the disassembly looks good (with latest iasl from upstream git). One suggestion : maybe also add the raw table data as well of length 88. > --- > tests/qtest/bios-tables-test-allowed-diff.h | 1 - > tests/data/acpi/virt/VIOT | Bin 0 -> 88 bytes > 2 files changed, 1 deletion(-) > > diff --git a/tests/qtest/bios-tables-test-allowed-diff.h > b/tests/qtest/bios-tables-test-allowed-diff.h > index 29b5b1eabc..fa213e4738 100644 > --- a/tests/qtest/bios-tables-test-allowed-diff.h > +++ b/tests/qtest/bios-tables-test-allowed-diff.h > @@ -1,4 +1,3 @@ > /* List of comma-separated changed AML files to ignore */ > -"tests/data/acpi/virt/VIOT", > "tests/data/acpi/q35/DSDT.viot", > "tests/data/acpi/q35/VIOT.viot", > diff --git a/tests/data/acpi/virt/VIOT b/tests/data/acpi/virt/VIOT > index > e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..921f40d88c28ba2171a4d664e119914335309e7d > 100644 > GIT binary patch > literal 88 > zcmWIZ^bd((0D?3pe`k+i1*eDrX9XZ&1PX!JAexE60Hgv8m>C3sGzXN&z`)2L0cSHX > I{D-Rq0Q5fy0RR91 > > literal 0 > HcmV?d1 > > -- > 2.33.0 > >
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Mon, Oct 04 2021, "Michael S. Tsirkin" wrote: > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote: >> --8<- >> >> From: Halil Pasic >> Date: Thu, 30 Sep 2021 02:38:47 +0200 >> Subject: [PATCH] virtio: write back feature VERSION_1 before verify >> >> This patch fixes a regression introduced by commit 82e89ea077b9 >> ("virtio-blk: Add validation for block size in config space") and >> enables similar checks in verify() on big endian platforms. >> >> The problem with checking multi-byte config fields in the verify >> callback, on big endian platforms, and with a possibly transitional >> device is the following. The verify() callback is called between >> config->get_features() and virtio_finalize_features(). That we have a >> device that offered F_VERSION_1 then we have the following options >> either the device is transitional, and then it has to present the legacy >> interface, i.e. a big endian config space until F_VERSION_1 is >> negotiated, or we have a non-transitional device, which makes >> F_VERSION_1 mandatory, and only implements the non-legacy interface and >> thus presents a little endian config space. Because at this point we >> can't know if the device is transitional or non-transitional, we can't >> know do we need to byte swap or not. > > Well we established that we can know. Here's an alternative explanation: > > The virtio specification virtio-v1.1-cs01 states: > > Transitional devices MUST detect Legacy drivers by detecting that > VIRTIO_F_VERSION_1 has not been acknowledged by the driver. > This is exactly what QEMU as of 6.1 has done relying solely > on VIRTIO_F_VERSION_1 for detecting that. > > However, the specification also says: > driver MAY read (but MUST NOT write) the device-specific > configuration fields to check that it can support the device before > accepting it. > > In that case, any device relying solely on VIRTIO_F_VERSION_1 > for detecting legacy drivers will return data in legacy format. > In particular, this implies that it is in big endian format > for big endian guests. This naturally confuses the driver > which expects little endian in the modern mode. > > It is probably a good idea to amend the spec to clarify that > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > is complete. However, we already have regression so let's > try to address it. I prefer that explanation. > > >> >> The virtio spec explicitly states that the driver MAY read config >> between reading and writing the features so saying that first accessing >> the config before feature negotiation is done is not an option. The >> specification ain't clear about setting the features multiple times >> before FEATURES_OK, so I guess that should be fine to set F_VERSION_1 >> since at this point we already know that we are about to negotiate >> F_VERSION_1. >> >> I don't consider this patch super clean, but frankly I don't think we >> have a ton of options. Another option that may or man not be cleaner, >> but is also IMHO much uglier is to figure out whether the device is >> transitional by rejecting _F_VERSION_1, then resetting it and proceeding >> according tho what we have figured out, hoping that the characteristics >> of the device didn't change. > > An empty line before tags. > >> Signed-off-by: Halil Pasic >> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config >> space") >> Reported-by: mark...@us.ibm.com > > Let's add more commits that are affected. E.g. virtio-net with MTU > feature bit set is affected too. > > So let's add Fixes tag for: > commit 14de9d114a82a564b94388c95af79a701dc93134 > Author: Aaron Conole > Date: Fri Jun 3 16:57:12 2016 -0400 > > virtio-net: Add initial MTU advice feature > > I think that's all, but pls double check me. I could not find anything else after a quick check. > > >> --- >> drivers/virtio/virtio.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c >> index 0a5b54034d4b..2b9358f2e22a 100644 >> --- a/drivers/virtio/virtio.c >> +++ b/drivers/virtio/virtio.c >> @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d) >> driver_features_legacy = driver_features; >> } >> >> +/* Write F_VERSION_1 feature to pin down endianness */ >> +if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) { >> +dev->features = (1ULL << VIRTIO_F_VERSION_1); >> +dev->config->finalize_features(dev); >> +} >> + >> if (device_features & (1ULL << VIRTIO_F_VERSION_1)) >> dev->features = driver_features & device_features; >> else >> -- >> 2.31.1 I think we should go with this just to fix the nasty regression for now.
Re: [PATCH v4 11/11] tests/acpi: add expected blobs for VIOT test on q35 machine
On Fri, 1 Oct 2021, Jean-Philippe Brucker wrote: > Add expected blobs of the VIOT and DSDT table for the VIOT test on the > q35 machine. > > Since the test instantiates a virtio device and two PCIe expander > bridges, DSDT.viot has more blocks than the base DSDT (long diff not > shown here). For documentation and bisection of issues in future, I think its better to provide the DSDT table ASL diff here as well. >The VIOT table generated for the q35 test is: > > [000h 4]Signature : "VIOT"[Virtual I/O > Translation Table] > [004h 0004 4] Table Length : 0070 > [008h 0008 1] Revision : 00 > [009h 0009 1] Checksum : 3D > [00Ah 0010 6] Oem ID : "BOCHS " > [010h 0016 8] Oem Table ID : "BXPC" > [018h 0024 4] Oem Revision : 0001 > [01Ch 0028 4] Asl Compiler ID : "BXPC" > [020h 0032 4]Asl Compiler Revision : 0001 > > [024h 0036 2] Node count : 0003 > [026h 0038 2] Node offset : 0030 > [028h 0040 8] Reserved : > > [030h 0048 1] Type : 03 [VirtIO-PCI IOMMU] > [031h 0049 1] Reserved : 00 > [032h 0050 2] Length : 0010 > > [034h 0052 2] PCI Segment : > [036h 0054 2] PCI BDF number : 0010 > [038h 0056 8] Reserved : > > [040h 0064 1] Type : 01 [PCI Range] > [041h 0065 1] Reserved : 00 > [042h 0066 2] Length : 0018 > > [044h 0068 4] Endpoint start : 3000 > [048h 0072 2]PCI Segment start : > [04Ah 0074 2] PCI Segment end : > [04Ch 0076 2]PCI BDF start : 3000 > [04Eh 0078 2] PCI BDF end : 30FF > [050h 0080 2] Output node : 0030 > [052h 0082 6] Reserved : > > [058h 0088 1] Type : 01 [PCI Range] > [059h 0089 1] Reserved : 00 > [05Ah 0090 2] Length : 0018 > > [05Ch 0092 4] Endpoint start : 1000 > [060h 0096 2]PCI Segment start : > [062h 0098 2] PCI Segment end : > [064h 0100 2]PCI BDF start : 1000 > [066h 0102 2] PCI BDF end : 10FF > [068h 0104 2] Output node : 0030 > [06Ah 0106 6] Reserved : > > Signed-off-by: Jean-Philippe Brucker > --- > tests/qtest/bios-tables-test-allowed-diff.h | 2 -- > tests/data/acpi/q35/DSDT.viot | Bin 0 -> 9398 bytes > tests/data/acpi/q35/VIOT.viot | Bin 0 -> 112 bytes > 3 files changed, 2 deletions(-) > > diff --git a/tests/qtest/bios-tables-test-allowed-diff.h > b/tests/qtest/bios-tables-test-allowed-diff.h > index fa213e4738..dfb8523c8b 100644 > --- a/tests/qtest/bios-tables-test-allowed-diff.h > +++ b/tests/qtest/bios-tables-test-allowed-diff.h > @@ -1,3 +1 @@ > /* List of comma-separated changed AML files to ignore */ > -"tests/data/acpi/q35/DSDT.viot", > -"tests/data/acpi/q35/VIOT.viot", > diff --git a/tests/data/acpi/q35/DSDT.viot b/tests/data/acpi/q35/DSDT.viot > index > e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b41270ff6d63493c2ae379ddd1d3e28f190a6c01 > 100644 > GIT binary patch > literal 9398 > zcmeHNO>7&-8J*>iv|O&FB}G~OOGG$M|57BBoWHhc5OS9yDTx$CQgH$r;8Idr*-4Q_ > z5(9Az1F`}niVsB-#zBvCpa8wKr(7GLxfJNZhXOUwQxCo5S`_gq>icGPq#2R|qEj!C > zfZhFO-E)yj*v)_%j$|bWD4v61&3MJ6@sGF_Mv( > z(Y~GJ$Ji9i%ul_-ddc|xw*RT`zx{!4bOW~WnR9oe8@#vYZ!iK~-v}&=4xHj-r&;K< > zcU`OQR&r*iT=DGueakdEt~iRCoxImzW@o+PvCPVNXSM0Z?!3la@A7=V7VmARrY)yk > z{pY1`=FY$P>E*ZcU;gqRzq<396$4-adlUOh0d4%7zBT9fosWB0jax+L=jQv zQRdK@z^9UXwkV>i=J#J~?>_G}@-A=VM7>texw(0?%WX7MbJqC}W*M`obLj6+2L}g# > z7KhBa!JMioR2I#0z1Wf}4QL}(?VWPHRb@6~_rFcDSo^j^@$^f@nwPCNyiPXrY^T}E > zvw%wcfQq{B`j+GO?T>ms>-oupgMHSY{HWJupLA{Zum8sP*}gR;+Lp2=-%n6m?tjZ- > zjG;9@c#>K}{oUR@TWRJyyo-^34o#_78fy{Dw`^y5>Zzy%5~{uX^m4%iSX`qhT8~!A > zG^eeZlHoI-8Ai$2Vq4f>h#*^g_hNN*{g5>^t+7liet~+Zy}PhdZ_UfPW8!)n8rHEU > zO2#|UccP|wVTaee;I38=IdP!Tn zmOARAox0m>8Og6~%fzLjz(wD!XR-0J?VV zfm^7pSF`ns_j0yv6jt12mU+DH7MCLJ$0#~D2(}3k+%T>(s-yiwD&A+AC-UHoLQ!1- > zZTt}HXS}hx*Q`$VSHhuj|GB^ZyZOw!)sJSsuAcdeTMekL*MH;pAM0IX{WHC*Rs z7Qc^d+_nd7KNU4@(}vxf?a%bCS>r)E9$^!#8~A%`e2rz2YvijNQTB2(~G5e*20+ > zH;dzb%?EP5(W z08WZ?oCl~3iHZ6-Ho}>}h7mC(G{QI&P|ie1Otgk$qns&Q5M{)a(5PSn%9#j>DYIZ) > z2`sNC#+ect6HM87gsRTCrZdi&5*imw*?5Gi&M{5r7-vf8n649{s&ib^Ij-p(*L5OP > zb()$^Q`2ecIuWWm@dQ$OI-%)I=sFRqIxS77rRlVEod{K(Nlj-`)0xzDB2;zaS*To3 > zThnRlIuWWmCp4WCn$8JbCqh-{q^5IH(>bZ@M5yYV(sWK~I;V7<2vwbqrqj`MI=W7T > zs?LqMyPoYr(sYdWWOod{K(8BJ$K)0xqAB2;zGXgX&! > zoin;lgsRR{n$A<2&QrQhgsM)=Byji1=g_RCb5_@hP}O-_(|KCcd0N+rP}O;cGxOn- > z@C;`b!iU`%!E}#8VtOI=tj0X6G0*Bugevo##
[PATCH] ui/gtk: Update the refresh rate for gl-area too
This is a bugfix that stretches all the way back to January 2020, where I initially introduced this problem and potential solutions. A quick recap of the issue: QEMU did not sync up with the monitors refresh rate causing the VM to render frames that were NOT displayed to the user. That "fix" allowed QEMU to obtain the screen refreshrate information from the system using GDK API's and was for GTK only. Well, I'm back with the same issue again. But this time on Wayland. And I did NOT realize there was YET another screen refresh rate function, this time for Wayland specifically. Thankfully the fix was simple and without much hassle. Thanks, Nikola Signed-off-by: Nikola Pavlica --- ui/gtk-gl-area.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index b23523748e..afcb29f658 100644 --- a/ui/gtk-gl-area.c +++ b/ui/gtk-gl-area.c @@ -112,6 +112,9 @@ void gd_gl_area_refresh(DisplayChangeListener *dcl) { VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl); +vc->gfx.dcl.update_interval = gd_monitor_update_interval( +vc->window ? vc->window : vc->gfx.drawing_area); + if (!vc->gfx.gls) { if (!gtk_widget_get_realized(vc->gfx.drawing_area)) { return; -- 2.33.0
Re: [PATCH v1 2/2] migration: add missing qemu_mutex_lock_iothread in migration_completion
* Emanuele Giuseppe Esposito (eespo...@redhat.com) wrote: > qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL) > to be held, but instead it isn't. > > Signed-off-by: Emanuele Giuseppe Esposito Interesting, I think you're right - and I think it's been missing it from the start. Reviewed-by: Dr. David Alan Gilbert > --- > migration/migration.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 041b8451a6..215d5281f2 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3182,7 +3182,10 @@ static void migration_completion(MigrationState *s) > } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { > trace_migration_completion_postcopy_end(); > > +qemu_mutex_lock_iothread(); > qemu_savevm_state_complete_postcopy(s->to_dst_file); > +qemu_mutex_unlock_iothread(); > + > trace_migration_completion_postcopy_end_after_complete(); > } else if (s->state == MIGRATION_STATUS_CANCELLING) { > goto fail; > -- > 2.27.0 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v4 09/11] tests/acpi: add test cases for VIOT
On Fri, 1 Oct 2021, Jean-Philippe Brucker wrote: > Add two test cases for VIOT, one on the q35 machine and the other on > virt. To test complex topologies the q35 test has two PCIe buses that > bypass the IOMMU (and are therefore not described by VIOT), and two > buses that are translated by virtio-iommu. > > Signed-off-by: Jean-Philippe Brucker This might be a stupid question but what about virtio-mmio and single mmio cases? I see none of your tables has nodes for those and here too you do not add test cases for it. > --- > tests/qtest/bios-tables-test.c | 38 ++ > 1 file changed, 38 insertions(+) > > diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c > index 4f11d03055..b6cb383bd9 100644 > --- a/tests/qtest/bios-tables-test.c > +++ b/tests/qtest/bios-tables-test.c > @@ -1403,6 +1403,42 @@ static void test_acpi_virt_tcg(void) > free_test_data(&data); > } > > +static void test_acpi_q35_viot(void) > +{ > +test_data data = { > +.machine = MACHINE_Q35, > +.variant = ".viot", > +}; > + > +/* > + * To keep things interesting, two buses bypass the IOMMU. > + * VIOT should only describes the other two buses. > + */ > +test_acpi_one("-machine default_bus_bypass_iommu=on " > + "-device virtio-iommu " > + "-device pxb-pcie,bus_nr=0x10,id=pcie.100,bus=pcie.0 " > + "-device > pxb-pcie,bus_nr=0x20,id=pcie.200,bus=pcie.0,bypass_iommu=on " > + "-device pxb-pcie,bus_nr=0x30,id=pcie.300,bus=pcie.0", > + &data); > +free_test_data(&data); > +} > + > +static void test_acpi_virt_viot(void) > +{ > +test_data data = { > +.machine = "virt", > +.uefi_fl1 = "pc-bios/edk2-aarch64-code.fd", > +.uefi_fl2 = "pc-bios/edk2-arm-vars.fd", > +.cd = > "tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2", > +.ram_start = 0x4000ULL, > +.scan_len = 128ULL * 1024 * 1024, > +}; > + > +test_acpi_one("-cpu cortex-a57 " > + "-device virtio-iommu", &data); > +free_test_data(&data); > +} > + > static void test_oem_fields(test_data *data) > { > int i; > @@ -1567,12 +1603,14 @@ int main(int argc, char *argv[]) > if (strcmp(arch, "x86_64") == 0) { > qtest_add_func("acpi/microvm/pcie", test_acpi_microvm_pcie_tcg); > } > +qtest_add_func("acpi/q35/viot", test_acpi_q35_viot); > } else if (strcmp(arch, "aarch64") == 0) { > qtest_add_func("acpi/virt", test_acpi_virt_tcg); > qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem); > qtest_add_func("acpi/virt/memhp", test_acpi_virt_tcg_memhp); > qtest_add_func("acpi/virt/pxb", test_acpi_virt_tcg_pxb); > qtest_add_func("acpi/virt/oem-fields", test_acpi_oem_fields_virt); > +qtest_add_func("acpi/virt/viot", test_acpi_virt_viot); > } > ret = g_test_run(); > boot_sector_cleanup(disk); > -- > 2.33.0 > >
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Mon, 4 Oct 2021 05:07:13 -0400 "Michael S. Tsirkin" wrote: > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote: > > On Sat, 2 Oct 2021 14:13:37 -0400 > > "Michael S. Tsirkin" wrote: > > > > > > Anyone else have an idea? This is a nasty regression; we could revert > > > > the > > > > patch, which would remove the symptoms and give us some time, but that > > > > doesn't really feel right, I'd do that only as a last resort. > > > > > > Well we have Halil's hack (except I would limit it > > > to only apply to BE, only do devices with validate, > > > and only in modern mode), and we will fix QEMU to be spec compliant. > > > Between these why do we need any conditional compiles? > > > > We don't. As I stated before, this hack is flawed because it > > effectively breaks fencing features by the driver with QEMU. Some > > features can not be unset after once set, because we tend to try to > > enable the corresponding functionality whenever we see a write > > features operation with the feature bit set, and we don't disable, if a > > subsequent features write operation stores the feature bit as not set. > > Something to fix in QEMU too, I think. Possibly. But it is the same situation: it probably has a long history. And it may even make some sense. The obvious trigger for doing the conditional initialization for modern is the setting of FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would need a different trigger. > > > But it looks like VIRTIO_1 is fine to get cleared afterwards. > > We'd never clear it though - why would we? > Right. > > So my hack > > should actually look like posted below, modulo conditions. > > > Looking at it some more, I see that vhost-user actually > does not send features to the backend until FEATURES_OK. I.e. the hack does not work for transitional vhost-user devices, but it doesn't break them either. Furthermore, I believe there is not much we can do to support transitional devices with vhost-user and similar, without extending the protocol. The transport specific detection idea would need a new vhost-user thingy to tell the device what has been figured out, right? In theory modern only could work, if the backends were paying extra attention to endianness, instead of just assuming that the code is running little-endian. > However, the code in contrib for vhost-user-blk at least seems > broken wrt endian-ness ATM. Agree. For example config is native endian ATM AFAICT. > What about other backends though? I think whenever the config is owned and managed by the vhost-backend we have a problem with transitional. And we don't have everything in the protocol to deal with this problem. I didn't check modern for the different vhost-user backends. I don't think we recommend our users on s390 to use those. My understanding of the use-cases is far form complete. > Hard to be sure right? I agree. > Cc Raphael and Stefan so they can take a look. > And I guess it's time we CC'd qemu-devel too. > > For now I am beginning to think we should either revert or just limit > validation to LE and think about all this some more. And I am inclining > to do a revert. I'm fine with either of these as a quick fix, but we will eventually have to find a solution. AFAICT this solution works for the s390 setups we care about the most, but so would a revert. > These are all hypervisors that shipped for a long time. > Do we need a flag for early config space access then? You mean a feature bit? I think it is a good idea even if it weren't strictly necessary. We will have a behavior change for some devices, and I think the ability to detect those is valuable. Your spec change proposal, makes it IMHO pretty clear, that we are changing our understanding of how transitional should work. Strictly, transitional is not a normative part of the spec AFAIU, but still... > > > > > > > Regarding the conditions I guess checking that driver_features has > > F_VERSION_1 already satisfies "only modern mode", or? > > Right. > > > For now > > I've deliberately omitted the has verify and the is big endian > > conditions so we have a better chance to see if something breaks > > (i.e. the approach does not work). I can add in those extra conditions > > later. > > Or maybe if we will go down that road just the verify check (for > performance). I'm a bit unhappy we have the extra exit but consistency > seems more important. > I'm fine either way. The extra exit is only for the initialization and one per 1 device, I have no feeling if this has a measurable performance impact. > > > > --8<- > > > > From: Halil Pasic > > Date: Thu, 30 Sep 2021 02:38:47 +0200 > > Subject: [PATCH] virtio: write back feature VERSION_1 before verify > > > > This patch fixes a regression introduced by commit 82e89ea077b9 > > ("virtio-blk: Add validation for block size in config space") and > > enables similar checks in verif
Re: [PATCH 3/3] vdpa: Check for iova range at mappings changes
On Tue, Oct 05, 2021 at 11:58:12AM +0200, Eugenio Perez Martin wrote: > On Tue, Oct 5, 2021 at 10:14 AM Michael S. Tsirkin wrote: > > > > On Tue, Oct 05, 2021 at 10:01:31AM +0200, Eugenio Pérez wrote: > > > Check vdpa device range before updating memory regions so we don't add > > > any outside of it, and report the invalid change if any. > > > > > > Signed-off-by: Eugenio Pérez > > > --- > > > include/hw/virtio/vhost-vdpa.h | 2 + > > > hw/virtio/vhost-vdpa.c | 68 ++ > > > hw/virtio/trace-events | 1 + > > > 3 files changed, 55 insertions(+), 16 deletions(-) > > > > > > diff --git a/include/hw/virtio/vhost-vdpa.h > > > b/include/hw/virtio/vhost-vdpa.h > > > index a8963da2d9..c288cf7ecb 100644 > > > --- a/include/hw/virtio/vhost-vdpa.h > > > +++ b/include/hw/virtio/vhost-vdpa.h > > > @@ -13,6 +13,7 @@ > > > #define HW_VIRTIO_VHOST_VDPA_H > > > > > > #include "hw/virtio/virtio.h" > > > +#include "standard-headers/linux/vhost_types.h" > > > > > > typedef struct VhostVDPAHostNotifier { > > > MemoryRegion mr; > > > @@ -24,6 +25,7 @@ typedef struct vhost_vdpa { > > > uint32_t msg_type; > > > bool iotlb_batch_begin_sent; > > > MemoryListener listener; > > > +struct vhost_vdpa_iova_range iova_range; > > > struct vhost_dev *dev; > > > VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; > > > } VhostVDPA; > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > > index a1de6c7c9c..26d0258723 100644 > > > --- a/hw/virtio/vhost-vdpa.c > > > +++ b/hw/virtio/vhost-vdpa.c > > > @@ -33,20 +33,34 @@ static Int128 vhost_vdpa_section_end(const > > > MemoryRegionSection *section) > > > return llend; > > > } > > > > > > -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > > *section) > > > -{ > > > -return (!memory_region_is_ram(section->mr) && > > > -!memory_region_is_iommu(section->mr)) || > > > -memory_region_is_protected(section->mr) || > > > - /* vhost-vDPA doesn't allow MMIO to be mapped */ > > > -memory_region_is_ram_device(section->mr) || > > > - /* > > > -* Sizing an enabled 64-bit BAR can cause spurious mappings to > > > -* addresses in the upper part of the 64-bit address space. > > > These > > > -* are never accessed by the CPU and beyond the address width > > > of > > > -* some IOMMU hardware. TODO: VDPA should tell us the IOMMU > > > width. > > > -*/ > > > - section->offset_within_address_space & (1ULL << 63); > > > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > > *section, > > > +uint64_t iova_min, > > > +uint64_t iova_max) > > > +{ > > > +Int128 llend; > > > +bool r = (!memory_region_is_ram(section->mr) && > > > + !memory_region_is_iommu(section->mr)) || > > > + memory_region_is_protected(section->mr) || > > > + /* vhost-vDPA doesn't allow MMIO to be mapped */ > > > + memory_region_is_ram_device(section->mr); > > > +if (r) { > > > +return true; > > > +} > > > + > > > +if (section->offset_within_address_space < iova_min) { > > > +error_report("RAM section out of device range (min=%lu, > > > addr=%lu)", > > > + iova_min, section->offset_within_address_space); > > > +return true; > > > +} > > > + > > > +llend = vhost_vdpa_section_end(section); > > > +if (int128_make64(llend) > iova_max) { > > > > I am puzzled by this. > > You are taking a Int128, converting to u64, converting > > back to Int128, and comparing to u64. > > Head spins. What is all this back and forth trying to achieve? > > > > You are totally right, this series was extracted from a longer one > where I didn't use vhost_vdpa_section_end, but raw addresses. Then I > applied int128_make64 to the wrong variable, too fast. > > To be sure we are on the same page, to do: > > if (int128_ge(int128_make64(iova), llend)) { > // error message > return; > } > > The same way as vhost_vdpa_listener_region_{add,del} would be ok? > > Thanks! should be ok, yea > > > +error_report("RAM section out of device range (max=%lu, end > > > addr=%lu)", > > > + iova_max, (uint64_t)int128_make64(llend)); > > > +return true; > > > +} > > > + > > > +return false; > > > } > > > > > > static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr > > > size, > > > @@ -158,7 +172,8 @@ static void > > > vhost_vdpa_listener_region_add(MemoryListener *listener, > > > void *vaddr; > > > int ret; > > > > > > -if (vhost_vdpa_listener_skipped_section(section)) { > > > +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, > > > +
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, 5 Oct 2021 03:53:17 -0400 "Michael S. Tsirkin" wrote: > > Wouldn't a call from transport code into virtio core > > be more handy? What I have in mind is stuff like vhost-user and vdpa. My > > understanding is, that for vhost setups where the config is outside qemu, > > we probably need a new command that tells the vhost backend what > > endiannes to use for config. I don't think we can use > > VHOST_USER_SET_VRING_ENDIAN because that one is on a virtqueue basis > > according to the doc. So for vhost-user and similar we would fire that > > command and probably also set the filed, while for devices for which > > control plane is handled by QEMU we would just set the field. > > > > Does that sound about right? > > I'm fine either way, but when would you invoke this? > With my idea backends can check the field when get_config > is invoked. > > As for using this in VHOST, can we maybe re-use SET_FEATURES? > > Kind of hacky but nice in that it will actually make existing backends > work... Basically the equivalent of this patch, just on the vhost interface, right? Could work I have to look into it :)
Re: [PATCH 2/3] vdpa: Add vhost_vdpa_section_end
On Tue, Oct 05, 2021 at 11:52:37AM +0200, Eugenio Perez Martin wrote: > On Tue, Oct 5, 2021 at 10:15 AM Michael S. Tsirkin wrote: > > > > On Tue, Oct 05, 2021 at 10:01:30AM +0200, Eugenio Pérez wrote: > > > Abstract this operation, that will be reused when validating the region > > > against the iova range that the device supports. > > > > > > Signed-off-by: Eugenio Pérez > > > > Note that as defined end is actually 1 byte beyond end of section. > > As such it can e.g. overflow if cast to u64. > > So be careful to use int128 ops with it. > > You are right, but this is only the result of extracting "llend" > calculation in its own function, since it is going to be used a third > time in the next commit. This next commit contains a mistake because > of this, as you pointed out. > > Since "last" would be a very misleading name, do you think we could > give a better name / type to it? > > > Also - document? > > It will be documented with that ("It returns one byte beyond end of > section" or similar) too. > > Thanks! that's how c++ containers work so maybe it's not too bad as long as we document this carefully. > > > > > --- > > > hw/virtio/vhost-vdpa.c | 18 +++--- > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > > index ea1aa71ad8..a1de6c7c9c 100644 > > > --- a/hw/virtio/vhost-vdpa.c > > > +++ b/hw/virtio/vhost-vdpa.c > > > @@ -24,6 +24,15 @@ > > > #include "trace.h" > > > #include "qemu-common.h" > > > > > > +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) > > > +{ > > > +Int128 llend = int128_make64(section->offset_within_address_space); > > > +llend = int128_add(llend, section->size); > > > +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > + > > > +return llend; > > > +} > > > + > > > static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > > *section) > > > { > > > return (!memory_region_is_ram(section->mr) && > > > @@ -160,10 +169,7 @@ static void > > > vhost_vdpa_listener_region_add(MemoryListener *listener, > > > } > > > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > > -llend = int128_make64(section->offset_within_address_space); > > > -llend = int128_add(llend, section->size); > > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > - > > > +llend = vhost_vdpa_section_end(section); > > > if (int128_ge(int128_make64(iova), llend)) { > > > return; > > > } > > > @@ -221,9 +227,7 @@ static void > > > vhost_vdpa_listener_region_del(MemoryListener *listener, > > > } > > > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > > -llend = int128_make64(section->offset_within_address_space); > > > -llend = int128_add(llend, section->size); > > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > +llend = vhost_vdpa_section_end(section); > > > > > > trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); > > > > > > -- > > > 2.27.0 > >
Re: [PATCH 06/11] qdev: Add Error parameter to qdev_set_id()
Am 27.09.2021 um 12:33 hat Damien Hedde geschrieben: > Hi Kevin, > > I proposed a very similar patch in our rfc series because we needed some of > the cleaning you do here. > https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg05679.html > I've added a bit of doc for the function, feel free to take it if you want. Thanks, I'm replacing my patch with yours for v2. Kevin
Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Dienstag, 5. Oktober 2021 09:38:53 CEST David Hildenbrand wrote: > On 04.10.21 21:38, Christian Schoenebeck wrote: > > At the moment the maximum transfer size with virtio is limited to 4M > > (1024 * PAGE_SIZE). This series raises this limit to its maximum > > theoretical possible transfer size of 128M (32k pages) according to the > > virtio specs: > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html# > > x1-240006 > I'm missing the "why do we care". Can you comment on that? Primary motivation is the possibility of improved performance, e.g. in case of 9pfs, people can raise the maximum transfer size with the Linux 9p client's 'msize' option on guest side (and only on guest side actually). If guest performs large chunk I/O, e.g. consider something "useful" like this one on guest side: time cat large_file_on_9pfs.dat > /dev/null Then there is a noticable performance increase with higher transfer size values. That performance gain is continuous with rising transfer size values, but the performance increase obviously shrinks with rising transfer sizes as well, as with similar concepts in general like cache sizes, etc. Then a secondary motivation is described in reason (2) of patch 2: if the transfer size is configurable on guest side (like it is the case with the 9pfs 'msize' option), then there is the unpleasant side effect that the current virtio limit of 4M is invisible to guest; as this value of 4M is simply an arbitrarily limit set on QEMU side in the past (probably just implementation motivated on QEMU side at that point), i.e. it is not a limit specified by the virtio protocol, nor is this limit be made aware to guest via virtio protocol at all. The consequence with 9pfs would be if user tries to go higher than 4M, then the system would simply hang with this QEMU error: virtio: too many write descriptors in indirect table Now whether this is an issue or not for individual virtio users, depends on whether the individual virtio user already had its own limitation <= 4M enforced on its side. Best regards, Christian Schoenebeck
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05 2021, Halil Pasic wrote: > On Mon, 4 Oct 2021 05:07:13 -0400 > "Michael S. Tsirkin" wrote: >> Well we established that we can know. Here's an alternative explanation: > > > I thin we established how this should be in the future, where a transport > specific mechanism is used to decide are we operating in legacy mode or > in modern mode. But with the current QEMU reality, I don't think so. > Namely currently the switch native-endian config -> little endian config > happens when the VERSION_1 is negotiated, which may happen whenever > the VERSION_1 bit is changed, or only when FEATURES_OK is set > (vhost-user). > > This is consistent with device should detect a legacy driver by checking > for VERSION_1, which is what the spec currently says. > > So for transitional we start out with native-endian config. For modern > only the config is always LE. > > The guest can distinguish between a legacy only device and a modern > capable device after the revision negotiation. A legacy device would > reject the CCW. > > But both a transitional device and a modern only device would accept > a revision > 0. So the guest does not know for ccw. Well, for pci I think the driver knows that it is using either legacy or modern, no? And for ccw, the driver knows at that point in time which revision it negotiated, so it should know that a revision > 0 will use LE (and the device will obviously know that as well.) Or am I misunderstanding what you're getting at?
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 12:43:03PM +0200, Halil Pasic wrote: > On Mon, 4 Oct 2021 05:07:13 -0400 > "Michael S. Tsirkin" wrote: > > > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote: > > > On Sat, 2 Oct 2021 14:13:37 -0400 > > > "Michael S. Tsirkin" wrote: > > > > > > > > Anyone else have an idea? This is a nasty regression; we could revert > > > > > the > > > > > patch, which would remove the symptoms and give us some time, but that > > > > > doesn't really feel right, I'd do that only as a last resort. > > > > > > > > Well we have Halil's hack (except I would limit it > > > > to only apply to BE, only do devices with validate, > > > > and only in modern mode), and we will fix QEMU to be spec compliant. > > > > Between these why do we need any conditional compiles? > > > > > > We don't. As I stated before, this hack is flawed because it > > > effectively breaks fencing features by the driver with QEMU. Some > > > features can not be unset after once set, because we tend to try to > > > enable the corresponding functionality whenever we see a write > > > features operation with the feature bit set, and we don't disable, if a > > > subsequent features write operation stores the feature bit as not set. > > > > Something to fix in QEMU too, I think. > > Possibly. But it is the same situation: it probably has a long > history. And it may even make some sense. The obvious trigger for > doing the conditional initialization for modern is the setting of > FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would > need a different trigger. > > > > > > But it looks like VIRTIO_1 is fine to get cleared afterwards. > > > > We'd never clear it though - why would we? > > > > Right. > > > > So my hack > > > should actually look like posted below, modulo conditions. > > > > > > Looking at it some more, I see that vhost-user actually > > does not send features to the backend until FEATURES_OK. > > I.e. the hack does not work for transitional vhost-user devices, > but it doesn't break them either. > > Furthermore, I believe there is not much we can do to support > transitional devices with vhost-user and similar, without extending > the protocol. The transport specific detection idea would need a new > vhost-user thingy to tell the device what has been figured > out, right? > > In theory modern only could work, if the backends were paying extra > attention to endianness, instead of just assuming that the code is > running little-endian. I think a reasonable thing is to send SET_FEATURES before each GET_CONFIG, to tell backend which format is expected. > > However, the code in contrib for vhost-user-blk at least seems > > broken wrt endian-ness ATM. > > Agree. For example config is native endian ATM AFAICT. > > > What about other backends though? > > I think whenever the config is owned and managed by the vhost-backend > we have a problem with transitional. And we don't have everything in > the protocol to deal with this problem. > > I didn't check modern for the different vhost-user backends. I don't > think we recommend our users on s390 to use those. My understanding > of the use-cases is far form complete. > > > Hard to be sure right? > > I agree. > > > Cc Raphael and Stefan so they can take a look. > > And I guess it's time we CC'd qemu-devel too. > > > > For now I am beginning to think we should either revert or just limit > > validation to LE and think about all this some more. And I am inclining > > to do a revert. > > I'm fine with either of these as a quick fix, but we will eventually have > to find a solution. AFAICT this solution works for the s390 setups we > care about the most, but so would a revert. The reason I like this one is that it also fixes MTU for virtio net, and that one we can't really revert. > > > > These are all hypervisors that shipped for a long time. > > Do we need a flag for early config space access then? > > You mean a feature bit? I think it is a good idea even if > it weren't strictly necessary. We will have a behavior change > for some devices, and I think the ability to detect those > is valuable. > > Your spec change proposal, makes it IMHO pretty clear, that > we are changing our understanding of how transitional should work. > Strictly, transitional is not a normative part of the spec AFAIU, > but still... > > > > > > > > > > > > > > Regarding the conditions I guess checking that driver_features has > > > F_VERSION_1 already satisfies "only modern mode", or? > > > > Right. > > > > > For now > > > I've deliberately omitted the has verify and the is big endian > > > conditions so we have a better chance to see if something breaks > > > (i.e. the approach does not work). I can add in those extra conditions > > > later. > > > > Or maybe if we will go down that road just the verify check (for > > performance). I'm a bit unhappy we have the extra exit but consistency > > seems more important. > > >
Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Dienstag, 5. Oktober 2021 09:16:07 CEST Michael S. Tsirkin wrote: > On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote: > > Raise the maximum possible virtio transfer size to 128M > > (more precisely: 32k * PAGE_SIZE). See previous commit for a > > more detailed explanation for the reasons of this change. > > > > For not breaking any virtio user, all virtio users transition > > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of > > VIRTQUEUE_MAX_SIZE, so they are all still using the old value > > of 1k with this commit. > > > > On the long-term, each virtio user should subsequently either > > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE > > after checking that they support the new value of 32k, or > > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE > > macro by an appropriate value supported by them. > > > > Signed-off-by: Christian Schoenebeck > > I don't think we need this. Legacy isn't descriptive either. Just leave > VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k. Does this mean you disagree that on the long-term all virtio users should transition either to the new upper limit of 32k max queue size or introduce their own limit at their end? Independent of the name, and I would appreciate for suggestions for an adequate macro name here, I still think this new limit should be placed in the shared virtio.h file. Because this value is not something invented on virtio user side. It rather reflects the theoretical upper limited possible with the virtio protocol, which is and will be common for all virtio users. > > --- > > > > hw/9pfs/virtio-9p-device.c | 2 +- > > hw/block/vhost-user-blk.c | 6 +++--- > > hw/block/virtio-blk.c | 6 +++--- > > hw/char/virtio-serial-bus.c| 2 +- > > hw/input/virtio-input.c| 2 +- > > hw/net/virtio-net.c| 12 ++-- > > hw/scsi/virtio-scsi.c | 2 +- > > hw/virtio/vhost-user-fs.c | 6 +++--- > > hw/virtio/vhost-user-i2c.c | 2 +- > > hw/virtio/vhost-vsock-common.c | 2 +- > > hw/virtio/virtio-balloon.c | 2 +- > > hw/virtio/virtio-crypto.c | 2 +- > > hw/virtio/virtio-iommu.c | 2 +- > > hw/virtio/virtio-mem.c | 2 +- > > hw/virtio/virtio-mmio.c| 4 ++-- > > hw/virtio/virtio-pmem.c| 2 +- > > hw/virtio/virtio-rng.c | 3 ++- > > include/hw/virtio/virtio.h | 20 +++- > > 18 files changed, 49 insertions(+), 30 deletions(-) > > > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > > index cd5d95dd51..9013e7df6e 100644 > > --- a/hw/9pfs/virtio-9p-device.c > > +++ b/hw/9pfs/virtio-9p-device.c > > @@ -217,7 +217,7 @@ static void virtio_9p_device_realize(DeviceState *dev, > > Error **errp)> > > v->config_size = sizeof(struct virtio_9p_config) + > > strlen(s->fsconf.tag); > > virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > > > > -VIRTQUEUE_MAX_SIZE); > > +VIRTQUEUE_LEGACY_MAX_SIZE); > > > > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > > > > } > > > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > index 336f56705c..e5e45262ab 100644 > > --- a/hw/block/vhost-user-blk.c > > +++ b/hw/block/vhost-user-blk.c > > @@ -480,9 +480,9 @@ static void vhost_user_blk_device_realize(DeviceState > > *dev, Error **errp)> > > error_setg(errp, "queue size must be non-zero"); > > return; > > > > } > > > > -if (s->queue_size > VIRTQUEUE_MAX_SIZE) { > > +if (s->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > > > > error_setg(errp, "queue size must not exceed %d", > > > > - VIRTQUEUE_MAX_SIZE); > > + VIRTQUEUE_LEGACY_MAX_SIZE); > > > > return; > > > > } > > > > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > > *dev, Error **errp)> > > } > > > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > > > > -sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > > +sizeof(struct virtio_blk_config), > > VIRTQUEUE_LEGACY_MAX_SIZE);> > > s->virtqs = g_new(VirtQueue *, s->num_queues); > > for (i = 0; i < s->num_queues; i++) { > > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > index 9c0f46815c..5883e3e7db 100644 > > --- a/hw/block/virtio-blk.c > > +++ b/hw/block/virtio-blk.c > > @@ -1171,10 +1171,10 @@ static void virtio_blk_device_realize(DeviceState > > *dev, Error **errp)> > > return; > > > > } > > if (!is_power_of_2(conf->queue_size) || > > > > -conf->queue_size > VIRTQUEUE_MAX_SIZE) { > > +conf->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > > > > error_setg(errp, "invalid queue-size property (%" PRIu16 "), " > > > > "must be a power of 2 (max %d)", > > > >
Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Tue, Oct 05, 2021 at 01:10:56PM +0200, Christian Schoenebeck wrote: > On Dienstag, 5. Oktober 2021 09:38:53 CEST David Hildenbrand wrote: > > On 04.10.21 21:38, Christian Schoenebeck wrote: > > > At the moment the maximum transfer size with virtio is limited to 4M > > > (1024 * PAGE_SIZE). This series raises this limit to its maximum > > > theoretical possible transfer size of 128M (32k pages) according to the > > > virtio specs: > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html# > > > x1-240006 > > I'm missing the "why do we care". Can you comment on that? > > Primary motivation is the possibility of improved performance, e.g. in case > of > 9pfs, people can raise the maximum transfer size with the Linux 9p client's > 'msize' option on guest side (and only on guest side actually). If guest > performs large chunk I/O, e.g. consider something "useful" like this one on > guest side: > > time cat large_file_on_9pfs.dat > /dev/null > > Then there is a noticable performance increase with higher transfer size > values. That performance gain is continuous with rising transfer size values, > but the performance increase obviously shrinks with rising transfer sizes as > well, as with similar concepts in general like cache sizes, etc. > > Then a secondary motivation is described in reason (2) of patch 2: if the > transfer size is configurable on guest side (like it is the case with the > 9pfs > 'msize' option), then there is the unpleasant side effect that the current > virtio limit of 4M is invisible to guest; as this value of 4M is simply an > arbitrarily limit set on QEMU side in the past (probably just implementation > motivated on QEMU side at that point), i.e. it is not a limit specified by > the > virtio protocol, According to the spec it's specified, sure enough: vq size limits the size of indirect descriptors too. However, ever since commit 44ed8089e991a60d614abe0ee4b9057a28b364e4 we do not enforce it in the driver ... > nor is this limit be made aware to guest via virtio protocol > at all. The consequence with 9pfs would be if user tries to go higher than > 4M, > then the system would simply hang with this QEMU error: > > virtio: too many write descriptors in indirect table > > Now whether this is an issue or not for individual virtio users, depends on > whether the individual virtio user already had its own limitation <= 4M > enforced on its side. > > Best regards, > Christian Schoenebeck >
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 12:46:34PM +0200, Halil Pasic wrote: > On Tue, 5 Oct 2021 03:53:17 -0400 > "Michael S. Tsirkin" wrote: > > > > Wouldn't a call from transport code into virtio core > > > be more handy? What I have in mind is stuff like vhost-user and vdpa. My > > > understanding is, that for vhost setups where the config is outside qemu, > > > we probably need a new command that tells the vhost backend what > > > endiannes to use for config. I don't think we can use > > > VHOST_USER_SET_VRING_ENDIAN because that one is on a virtqueue basis > > > according to the doc. So for vhost-user and similar we would fire that > > > command and probably also set the filed, while for devices for which > > > control plane is handled by QEMU we would just set the field. > > > > > > Does that sound about right? > > > > I'm fine either way, but when would you invoke this? > > With my idea backends can check the field when get_config > > is invoked. > > > > As for using this in VHOST, can we maybe re-use SET_FEATURES? > > > > Kind of hacky but nice in that it will actually make existing backends > > work... > > Basically the equivalent of this patch, just on the vhost interface, > right? Could work I have to look into it :) yep
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, Oct 05, 2021 at 01:13:31PM +0200, Cornelia Huck wrote: > On Tue, Oct 05 2021, Halil Pasic wrote: > > > On Mon, 4 Oct 2021 05:07:13 -0400 > > "Michael S. Tsirkin" wrote: > >> Well we established that we can know. Here's an alternative explanation: > > > > > > I thin we established how this should be in the future, where a transport > > specific mechanism is used to decide are we operating in legacy mode or > > in modern mode. But with the current QEMU reality, I don't think so. > > Namely currently the switch native-endian config -> little endian config > > happens when the VERSION_1 is negotiated, which may happen whenever > > the VERSION_1 bit is changed, or only when FEATURES_OK is set > > (vhost-user). > > > > This is consistent with device should detect a legacy driver by checking > > for VERSION_1, which is what the spec currently says. > > > > So for transitional we start out with native-endian config. For modern > > only the config is always LE. > > > > The guest can distinguish between a legacy only device and a modern > > capable device after the revision negotiation. A legacy device would > > reject the CCW. > > > > But both a transitional device and a modern only device would accept > > a revision > 0. So the guest does not know for ccw. > > Well, for pci I think the driver knows that it is using either legacy or > modern, no? > > And for ccw, the driver knows at that point in time which revision it > negotiated, so it should know that a revision > 0 will use LE (and the > device will obviously know that as well.) > > Or am I misunderstanding what you're getting at? Exactly what I'm saying.
Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Tue, Oct 05, 2021 at 01:17:59PM +0200, Christian Schoenebeck wrote: > On Dienstag, 5. Oktober 2021 09:16:07 CEST Michael S. Tsirkin wrote: > > On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote: > > > Raise the maximum possible virtio transfer size to 128M > > > (more precisely: 32k * PAGE_SIZE). See previous commit for a > > > more detailed explanation for the reasons of this change. > > > > > > For not breaking any virtio user, all virtio users transition > > > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of > > > VIRTQUEUE_MAX_SIZE, so they are all still using the old value > > > of 1k with this commit. > > > > > > On the long-term, each virtio user should subsequently either > > > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE > > > after checking that they support the new value of 32k, or > > > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE > > > macro by an appropriate value supported by them. > > > > > > Signed-off-by: Christian Schoenebeck > > > > I don't think we need this. Legacy isn't descriptive either. Just leave > > VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k. > > Does this mean you disagree that on the long-term all virtio users should > transition either to the new upper limit of 32k max queue size or introduce > their own limit at their end? depends. if 9pfs is the only one unhappy, we can keep 4k as the default. it's sure a safe one. > Independent of the name, and I would appreciate for suggestions for an > adequate macro name here, I still think this new limit should be placed in > the > shared virtio.h file. Because this value is not something invented on virtio > user side. It rather reflects the theoretical upper limited possible with the > virtio protocol, which is and will be common for all virtio users. We can add this to the linux uapi headers, sure. > > > --- > > > > > > hw/9pfs/virtio-9p-device.c | 2 +- > > > hw/block/vhost-user-blk.c | 6 +++--- > > > hw/block/virtio-blk.c | 6 +++--- > > > hw/char/virtio-serial-bus.c| 2 +- > > > hw/input/virtio-input.c| 2 +- > > > hw/net/virtio-net.c| 12 ++-- > > > hw/scsi/virtio-scsi.c | 2 +- > > > hw/virtio/vhost-user-fs.c | 6 +++--- > > > hw/virtio/vhost-user-i2c.c | 2 +- > > > hw/virtio/vhost-vsock-common.c | 2 +- > > > hw/virtio/virtio-balloon.c | 2 +- > > > hw/virtio/virtio-crypto.c | 2 +- > > > hw/virtio/virtio-iommu.c | 2 +- > > > hw/virtio/virtio-mem.c | 2 +- > > > hw/virtio/virtio-mmio.c| 4 ++-- > > > hw/virtio/virtio-pmem.c| 2 +- > > > hw/virtio/virtio-rng.c | 3 ++- > > > include/hw/virtio/virtio.h | 20 +++- > > > 18 files changed, 49 insertions(+), 30 deletions(-) > > > > > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > > > index cd5d95dd51..9013e7df6e 100644 > > > --- a/hw/9pfs/virtio-9p-device.c > > > +++ b/hw/9pfs/virtio-9p-device.c > > > @@ -217,7 +217,7 @@ static void virtio_9p_device_realize(DeviceState *dev, > > > Error **errp)> > > > v->config_size = sizeof(struct virtio_9p_config) + > > > strlen(s->fsconf.tag); > > > virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > > > > > > -VIRTQUEUE_MAX_SIZE); > > > +VIRTQUEUE_LEGACY_MAX_SIZE); > > > > > > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > > > > > > } > > > > > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > > index 336f56705c..e5e45262ab 100644 > > > --- a/hw/block/vhost-user-blk.c > > > +++ b/hw/block/vhost-user-blk.c > > > @@ -480,9 +480,9 @@ static void vhost_user_blk_device_realize(DeviceState > > > *dev, Error **errp)> > > > error_setg(errp, "queue size must be non-zero"); > > > return; > > > > > > } > > > > > > -if (s->queue_size > VIRTQUEUE_MAX_SIZE) { > > > +if (s->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) { > > > > > > error_setg(errp, "queue size must not exceed %d", > > > > > > - VIRTQUEUE_MAX_SIZE); > > > + VIRTQUEUE_LEGACY_MAX_SIZE); > > > > > > return; > > > > > > } > > > > > > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > > > *dev, Error **errp)> > > > } > > > > > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > > > > > > -sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > > > +sizeof(struct virtio_blk_config), > > > VIRTQUEUE_LEGACY_MAX_SIZE);> > > > s->virtqs = g_new(VirtQueue *, s->num_queues); > > > for (i = 0; i < s->num_queues; i++) { > > > > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > > index 9c0f46815c..5883e3e7db 100644 > > > --- a/hw/block/virtio-blk.c > > > +++ b/hw/block/virtio-blk.c > > > @@ -1171,10 +1171,10 @@ static v
Re: [PATCH v2 08/12] macfb: add common monitor modes supported by the MacOS toolbox ROM
On 05/10/2021 10:50, Laurent Vivier wrote: Le 04/10/2021 à 23:19, Mark Cave-Ayland a écrit : The monitor modes table is found by experimenting with the Monitors Control Panel in MacOS and analysing the reads/writes. From this it can be found that the mode is controlled by writes to the DAFB_MODE_CTRL1 and DAFB_MODE_CTRL2 registers. Implement the first block of DAFB registers as a register array including the existing sense register, the newly discovered control registers above, and also the DAFB_MODE_VADDR1 and DAFB_MODE_VADDR2 registers which are used by NetBSD to determine the current video mode. These experiments also show that the offset of the start of video RAM and the stride can change depending upon the monitor mode, so update macfb_draw_graphic() and both the BI_MAC_VADDR and BI_MAC_VROW bootinfo for the q800 machine accordingly. Finally update macfb_common_realize() so that only the resolution and depth supported by the display type can be specified on the command line. Signed-off-by: Mark Cave-Ayland Reviewed-by: Laurent Vivier --- hw/display/macfb.c | 124 - hw/display/trace-events| 1 + hw/m68k/q800.c | 11 ++-- include/hw/display/macfb.h | 16 - 4 files changed, 131 insertions(+), 21 deletions(-) diff --git a/hw/display/macfb.c b/hw/display/macfb.c index f98bcdec2d..357fe18be5 100644 --- a/hw/display/macfb.c +++ b/hw/display/macfb.c ... +static MacFbMode *macfb_find_mode(MacfbDisplayType display_type, + uint16_t width, uint16_t height, + uint8_t depth) +{ +MacFbMode *macfb_mode; +int i; + +for (i = 0; i < ARRAY_SIZE(macfb_mode_table); i++) { +macfb_mode = &macfb_mode_table[i]; + +if (display_type == macfb_mode->type && width == macfb_mode->width && +height == macfb_mode->height && depth == macfb_mode->depth) { +return macfb_mode; +} +} + +return NULL; +} + I misunderstood this part when I reviewed v1... It means you have to provide the monitor type to QEMU to switch from the default mode? Not as such: both the MacOS toolbox ROM and MacOS itself offer a fixed set of resolutions and depths based upon the display type. What I've done for now is default the display type to VGA since it offers both 640x480 and 800x600 in 1, 2, 4, 8, 16 and 24-bit colour which should cover the most common use of cases of people wanting to boot using the MacOS toolbox ROM. Even if you specify a default on the command line, MacOS still only cares about the display type and will allow you to change the resolution and depth dynamically, remembering the last resolution and depth across reboots. During testing I found that having access to the 1152x870 resolution offered by the Apple 21" monitor display type was useful to allow larger screen sizes, although only up to 8-bit depth so I added a bit of code that will switch from a VGA display type to a 21" display type if the graphics resolution is set to 1152x870x8. Finally if you boot a Linux kernel directly using -kernel then the provided XxYxD is placed directly into the relevant bootinfo fields with a VGA display type, unless a resolution of 1152x870x8 is specified in which case the 21" display type is used as above. But, as a user, how do we know which modes are allowed with which resolution? Is possible to try to set internally the type here according to the resolution? Could you provide an command line example how to start the q800 with the 1152x870 resolution? Sure - simply add "-g 1152x870x8" to your command line. If the -g parameter is omitted then the display type will default to VGA. ATB, Mark.
Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
On Mon, 4 Oct 2021 16:13:09 -0500 Eric DeVolder wrote: > Igor, thanks for the close examination. Inline responses below. > eric > > On 9/21/21 10:30 AM, Igor Mammedov wrote: > > On Thu, 5 Aug 2021 18:30:34 -0400 > > Eric DeVolder wrote: > > > >> This implements a PCI device for ACPI ERST. This implements the > >> non-NVRAM "mode" of operation for ERST as it is supported by > >> Linux and Windows. > >> > >> Signed-off-by: Eric DeVolder > >> --- > >> hw/acpi/erst.c | 750 > >> +++ > >> hw/acpi/meson.build | 1 + > >> hw/acpi/trace-events | 15 ++ > >> 3 files changed, 766 insertions(+) > >> create mode 100644 hw/acpi/erst.c > >> > >> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c > >> new file mode 100644 > >> index 000..eb4ab34 > >> --- /dev/null > >> +++ b/hw/acpi/erst.c > >> @@ -0,0 +1,750 @@ > >> +/* > >> + * ACPI Error Record Serialization Table, ERST, Implementation > >> + * > >> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009. > >> + * ACPI Platform Error Interfaces : Error Serialization > >> + * > >> + * Copyright (c) 2021 Oracle and/or its affiliates. > >> + * > >> + * SPDX-License-Identifier: GPL-2.0-or-later > >> + */ > >> + > >> +#include > >> +#include > >> +#include > >> + > >> +#include "qemu/osdep.h" > >> +#include "qapi/error.h" > >> +#include "hw/qdev-core.h" > >> +#include "exec/memory.h" > >> +#include "qom/object.h" > >> +#include "hw/pci/pci.h" > >> +#include "qom/object_interfaces.h" > >> +#include "qemu/error-report.h" > >> +#include "migration/vmstate.h" > >> +#include "hw/qdev-properties.h" > >> +#include "hw/acpi/acpi.h" > >> +#include "hw/acpi/acpi-defs.h" > >> +#include "hw/acpi/aml-build.h" > >> +#include "hw/acpi/bios-linker-loader.h" > >> +#include "exec/address-spaces.h" > >> +#include "sysemu/hostmem.h" > >> +#include "hw/acpi/erst.h" > >> +#include "trace.h" > >> + > >> +/* ACPI 4.0: Table 17-16 Serialization Actions */ > >> +#define ACTION_BEGIN_WRITE_OPERATION 0x0 > >> +#define ACTION_BEGIN_READ_OPERATION 0x1 > >> +#define ACTION_BEGIN_CLEAR_OPERATION 0x2 > >> +#define ACTION_END_OPERATION 0x3 > >> +#define ACTION_SET_RECORD_OFFSET 0x4 > >> +#define ACTION_EXECUTE_OPERATION 0x5 > >> +#define ACTION_CHECK_BUSY_STATUS 0x6 > >> +#define ACTION_GET_COMMAND_STATUS0x7 > >> +#define ACTION_GET_RECORD_IDENTIFIER 0x8 > >> +#define ACTION_SET_RECORD_IDENTIFIER 0x9 > >> +#define ACTION_GET_RECORD_COUNT 0xA > >> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION 0xB > >> +#define ACTION_RESERVED 0xC > >> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE 0xD > >> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH 0xE > >> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF > >> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10 > >> + > >> +/* ACPI 4.0: Table 17-17 Command Status Definitions */ > >> +#define STATUS_SUCCESS0x00 > >> +#define STATUS_NOT_ENOUGH_SPACE 0x01 > >> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02 > >> +#define STATUS_FAILED 0x03 > >> +#define STATUS_RECORD_STORE_EMPTY 0x04 > >> +#define STATUS_RECORD_NOT_FOUND 0x05 > >> + > >> + > >> +/* UEFI 2.1: Appendix N Common Platform Error Record */ > >> +#define UEFI_CPER_RECORD_MIN_SIZE 128U > >> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U > >> +#define UEFI_CPER_RECORD_ID_OFFSET 96U > >> +#define IS_UEFI_CPER_RECORD(ptr) \ > >> +(((ptr)[0] == 'C') && \ > >> + ((ptr)[1] == 'P') && \ > >> + ((ptr)[2] == 'E') && \ > >> + ((ptr)[3] == 'R')) > >> +#define THE_UEFI_CPER_RECORD_ID(ptr) \ > >> +(*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET])) > >> + > >> +/* > >> + * This implementation is an ACTION (cmd) and VALUE (data) > >> + * interface consisting of just two 64-bit registers. > >> + */ > >> +#define ERST_REG_SIZE (16UL) > >> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */ > >> +#define ERST_VALUE_OFFSET (8UL) /* argument/value (data) */ > >> + > >> +/* > >> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST > >> + * record contents. Thus, it defines the maximum record size. > >> + * As this is mapped through a PCI BAR, it must be a power of > >> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE. > >> + * The backing storage is divided into fixed size "slots", > >> + * each ERST_RECORD_SIZE in length, and each "slot" > >> + * storing a single record. No attempt at optimizing storage > >> + * through compression, compaction, etc is attempted. > >> + * NOTE that slot 0 is reserved for the backing storage header. > >> + * Depending upon the size of the backing storage, additional > >> + * slots will be part of the slot 0 header in order to account > >> + * for a record_id for each available remaining slot. > >> + */ > >> +/* 8KiB records, not too small, not too big */ > >> +#define ERST_RECORD_SIZE (8192UL) > >> + > >> +#define ACP
Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Dienstag, 5. Oktober 2021 13:19:43 CEST Michael S. Tsirkin wrote: > On Tue, Oct 05, 2021 at 01:10:56PM +0200, Christian Schoenebeck wrote: > > On Dienstag, 5. Oktober 2021 09:38:53 CEST David Hildenbrand wrote: > > > On 04.10.21 21:38, Christian Schoenebeck wrote: > > > > At the moment the maximum transfer size with virtio is limited to 4M > > > > (1024 * PAGE_SIZE). This series raises this limit to its maximum > > > > theoretical possible transfer size of 128M (32k pages) according to > > > > the > > > > virtio specs: > > > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.h > > > > tml# > > > > x1-240006 > > > > > > I'm missing the "why do we care". Can you comment on that? > > > > Primary motivation is the possibility of improved performance, e.g. in > > case of 9pfs, people can raise the maximum transfer size with the Linux > > 9p client's 'msize' option on guest side (and only on guest side > > actually). If guest performs large chunk I/O, e.g. consider something > > "useful" like this one on> > > guest side: > > time cat large_file_on_9pfs.dat > /dev/null > > > > Then there is a noticable performance increase with higher transfer size > > values. That performance gain is continuous with rising transfer size > > values, but the performance increase obviously shrinks with rising > > transfer sizes as well, as with similar concepts in general like cache > > sizes, etc. > > > > Then a secondary motivation is described in reason (2) of patch 2: if the > > transfer size is configurable on guest side (like it is the case with the > > 9pfs 'msize' option), then there is the unpleasant side effect that the > > current virtio limit of 4M is invisible to guest; as this value of 4M is > > simply an arbitrarily limit set on QEMU side in the past (probably just > > implementation motivated on QEMU side at that point), i.e. it is not a > > limit specified by the virtio protocol, > > According to the spec it's specified, sure enough: vq size limits the > size of indirect descriptors too. In the virtio specs the only hard limit that I see is the aforementioned 32k: "Queue Size corresponds to the maximum number of buffers in the virtqueue. Queue Size value is always a power of 2. The maximum Queue Size value is 32768. This value is specified in a bus-specific way." > However, ever since commit 44ed8089e991a60d614abe0ee4b9057a28b364e4 we > do not enforce it in the driver ... Then there is the current queue size (that you probably mean) which is transmitted to guest with whatever virtio was initialized with. In case of 9p client however the virtio queue size is first initialized with some initial hard coded value when the 9p driver is loaded on Linux kernel guest side, then when some 9pfs is mounted later on by guest, it may include the 'msize' mount option to raise the transfer size, and that's the problem. I don't see any way for guest to see that it cannot go above that 4M transfer size now. > > nor is this limit be made aware to guest via virtio protocol > > at all. The consequence with 9pfs would be if user tries to go higher than > > 4M,> > > then the system would simply hang with this QEMU error: > > virtio: too many write descriptors in indirect table > > > > Now whether this is an issue or not for individual virtio users, depends > > on > > whether the individual virtio user already had its own limitation <= 4M > > enforced on its side. > > > > Best regards, > > Christian Schoenebeck
Re: [PATCH v3 2/3] hw/virtio: Acquire RCU read lock in virtqueue_packed_drop_all()
On Mon, Oct 04, 2021 at 11:27:12AM +0200, Philippe Mathieu-Daudé wrote: On 10/4/21 11:23, Stefan Hajnoczi wrote: On Mon, Sep 06, 2021 at 12:43:17PM +0200, Philippe Mathieu-Daudé wrote: vring_get_region_caches() must be called with the RCU read lock acquired. virtqueue_packed_drop_all() does not, and uses the 'caches' pointer. Fix that by using the RCU_READ_LOCK_GUARD() macro. Is this a bug that has been encountered, is it a latent bug, a code cleanup, etc? The impact of this isn't clear but it sounds a little scary so I wanted to check. I'll defer to Stefano, but IIUC it is a latent bug discovered during code audit. Yep, I confirm this. We discovered it by discussing the documentation in a previous series. Thanks, Stefano
Re: [PATCH v4 06/11] hw/i386: Move vIOMMU uniqueness check into pc.c
Hi jean, On 10/1/21 7:33 PM, Jean-Philippe Brucker wrote: > We're about to need this check for a third vIOMMU, virtio-iommu, which > doesn't inherit X86IOMMUState as it doesn't support IRQ remapping and is > a virtio device. Move the check into the pre_plug callback to be shared > by all three vIOMMUs. > > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Tested-by: Eric Auger Eric > --- > hw/i386/pc.c| 10 +- > hw/i386/x86-iommu.c | 6 -- > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 557d49c9f8..789ccb6ef4 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1367,6 +1367,13 @@ static void pc_virtio_md_pci_unplug(HotplugHandler > *hotplug_dev, > static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, >DeviceState *dev, Error **errp) > { > +if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) && > +x86_iommu_get_default()) { > +error_setg(errp, "QEMU does not support multiple vIOMMUs " > + "for x86 yet."); > +return; > +} > + > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { > pc_memory_pre_plug(hotplug_dev, dev, errp); > } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { > @@ -1428,7 +1435,8 @@ static HotplugHandler > *pc_get_hotplug_handler(MachineState *machine, > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) || > object_dynamic_cast(OBJECT(dev), TYPE_CPU) || > object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) || > -object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) { > +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI) || > +object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) { > return HOTPLUG_HANDLER(machine); > } > > diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c > index 86ad03972e..550e551993 100644 > --- a/hw/i386/x86-iommu.c > +++ b/hw/i386/x86-iommu.c > @@ -84,12 +84,6 @@ static void x86_iommu_set_default(X86IOMMUState *x86_iommu) > { > assert(x86_iommu); > > -if (x86_iommu_default) { > -error_report("QEMU does not support multiple vIOMMUs " > - "for x86 yet."); > -exit(1); > -} > - > x86_iommu_default = x86_iommu; > } >
Re: [PATCH v4 03/11] hw/arm/virt: Remove device tree restriction for virtio-iommu
Hi Jean, On 10/1/21 7:33 PM, Jean-Philippe Brucker wrote: > virtio-iommu is now supported with ACPI VIOT as well as device tree. > Remove the restriction that prevents from instantiating a virtio-iommu > device under ACPI. > > Reviewed-by: Eric Auger > Signed-off-by: Jean-Philippe Brucker > --- > hw/arm/virt.c| 10 ++ > hw/virtio/virtio-iommu-pci.c | 7 --- > 2 files changed, 2 insertions(+), 15 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 1d59f0e59f..56e8fc7059 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2561,16 +2561,10 @@ static HotplugHandler > *virt_machine_get_hotplug_handler(MachineState *machine, > MachineClass *mc = MACHINE_GET_CLASS(machine); > > if (device_is_dynamic_sysbus(mc, dev) || > - (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) { > +object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) || > +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { > return HOTPLUG_HANDLER(machine); > } > -if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { > -VirtMachineState *vms = VIRT_MACHINE(machine); > - > -if (!vms->bootinfo.firmware_loaded || !virt_is_acpi_enabled(vms)) { > -return HOTPLUG_HANDLER(machine); > -} > -} > return NULL; > } > > diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c > index 770c286be7..f30eb16cbf 100644 > --- a/hw/virtio/virtio-iommu-pci.c > +++ b/hw/virtio/virtio-iommu-pci.c > @@ -48,16 +48,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy > *vpci_dev, Error **errp) > VirtIOIOMMU *s = VIRTIO_IOMMU(vdev); > > if (!qdev_get_machine_hotplug_handler(DEVICE(vpci_dev))) { > -MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine()); > - > -error_setg(errp, > - "%s machine fails to create iommu-map device tree > bindings", > - mc->name); actually this does not work. To add a hint you need the *errp to be set. Otherwise when running through this path you will get emu-system-x86_64: ../util/error.c:158: error_append_hint: Assertion `err && errp != &error_abort && errp != &error_fatal' failed. replace the error_append_hint with an error_setg (without the \n) Thanks Eric > error_append_hint(errp, >"Check your machine implements a hotplug handler " >"for the virtio-iommu-pci device\n"); > -error_append_hint(errp, "Check the guest is booted without FW or > with " > - "-no-acpi\n"); > return; > } > for (int i = 0; i < s->nb_reserved_regions; i++) {
Re: [RFC PATCH 1/1] virtio: write back features before verify
On Tue, 05 Oct 2021 13:13:31 +0200 Cornelia Huck wrote: > On Tue, Oct 05 2021, Halil Pasic wrote: > > > On Mon, 4 Oct 2021 05:07:13 -0400 > > "Michael S. Tsirkin" wrote: > >> Well we established that we can know. Here's an alternative explanation: > > > > > > I thin we established how this should be in the future, where a transport > > specific mechanism is used to decide are we operating in legacy mode or > > in modern mode. But with the current QEMU reality, I don't think so. > > Namely currently the switch native-endian config -> little endian config > > happens when the VERSION_1 is negotiated, which may happen whenever > > the VERSION_1 bit is changed, or only when FEATURES_OK is set > > (vhost-user). > > > > This is consistent with device should detect a legacy driver by checking > > for VERSION_1, which is what the spec currently says. > > > > So for transitional we start out with native-endian config. For modern > > only the config is always LE. > > > > The guest can distinguish between a legacy only device and a modern > > capable device after the revision negotiation. A legacy device would > > reject the CCW. > > > > But both a transitional device and a modern only device would accept > > a revision > 0. So the guest does not know for ccw. > > Well, for pci I think the driver knows that it is using either legacy or > modern, no? It is mighty complicated. virtio-blk-pci-non-transitional and virtio-net-pci-non-transitional will give you BE, but virtio-crypto-pci, which is also non-transitional will get you LE, before VERSION_1 is set (becausevirtio-crypto uses stl_le_p()). That is fact. The deal is that virtio-blk and virtion-net was written with transitional in mind, and config code is the same for transitional and non-transitional. That is how things are now. With the QEMU changes things will be simpler. > > And for ccw, the driver knows at that point in time which revision it > negotiated, so it should know that a revision > 0 will use LE (and the > device will obviously know that as well.) With the future changes in QEMU, yes. Without these changes no. Without these changes we get BE when the guest code things it is going to get LE. That is what causes the regression. The commit message for this patch is written from the perspective of right now, and not from the perspective of future changes. Or can you hack up a guest patch that looks at the revision, figures out what endiannes is the early config access in, and does the right thing? I don't think so. I tried to explain why that is impossible. Because that would be preferable to messing with the the device and introducing another exit. > > Or am I misunderstanding what you're getting at? > Probably. I'm talking about pre- "do transport specific legacy detection in the device instead of looking at VERSION_1" you are probably talking about the post-state. If we had this new behavior for all relevant hypervisors then we wouldn't need to do a thing in the guest. The current code would work like charm. Does that answer your question? Regards, Halil
Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
On Dienstag, 5. Oktober 2021 13:24:36 CEST Michael S. Tsirkin wrote: > On Tue, Oct 05, 2021 at 01:17:59PM +0200, Christian Schoenebeck wrote: > > On Dienstag, 5. Oktober 2021 09:16:07 CEST Michael S. Tsirkin wrote: > > > On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote: > > > > Raise the maximum possible virtio transfer size to 128M > > > > (more precisely: 32k * PAGE_SIZE). See previous commit for a > > > > more detailed explanation for the reasons of this change. > > > > > > > > For not breaking any virtio user, all virtio users transition > > > > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of > > > > VIRTQUEUE_MAX_SIZE, so they are all still using the old value > > > > of 1k with this commit. > > > > > > > > On the long-term, each virtio user should subsequently either > > > > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE > > > > after checking that they support the new value of 32k, or > > > > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE > > > > macro by an appropriate value supported by them. > > > > > > > > Signed-off-by: Christian Schoenebeck > > > > > > I don't think we need this. Legacy isn't descriptive either. Just leave > > > VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k. > > > > Does this mean you disagree that on the long-term all virtio users should > > transition either to the new upper limit of 32k max queue size or > > introduce > > their own limit at their end? > > depends. if 9pfs is the only one unhappy, we can keep 4k as > the default. it's sure a safe one. > > > Independent of the name, and I would appreciate for suggestions for an > > adequate macro name here, I still think this new limit should be placed in > > the shared virtio.h file. Because this value is not something invented on > > virtio user side. It rather reflects the theoretical upper limited > > possible with the virtio protocol, which is and will be common for all > > virtio users. > We can add this to the linux uapi headers, sure. Well, then I wait for few days, and if nobody else cares about this issue, then I just hard code 32k on 9pfs side exclusively in v3 for now and that's it. Best regards, Christian Schoenebeck
Re: Deprecate the ppc405 boards in QEMU?
On Tue, 5 Oct 2021, Thomas Huth wrote: On 05/10/2021 10.07, Thomas Huth wrote: On 05/10/2021 10.05, Alexey Kardashevskiy wrote: [...] What is so special about taihu? taihu is the other 405 board defined in hw/ppc/ppc405_boards.c (which I suggested to deprecate now) I've now also played with the u-boot sources a little bit, and with some bit of tweaking, it's indeed possible to compile the old taihu board there. However, it does not really work with QEMU anymore, it immediately triggers an assert(): $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio ** ERROR:accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: (qemu_mutex_iothread_locked()) Aborted (core dumped) Maybe it's similar to this: 2025fc6766ab25501e0041c564c44bb0f7389774 The helper_load_dcr() and helper_store_dcr() in target/ppc/timebase_helper.c seem to lock/unlock the iothread but I'm not sure if that's necessary. Also not sure why this does not happen with 460ex but that maybe uses different code. Going back to QEMU v2.3.0, I can see at least a little bit of output, but it then also triggers an assert() during DRAM initialization: $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio Reset PowerPC core U-Boot 2014.10-rc2-00123-g461be2f96e-dirty (Oct 05 2021 - 10:02:56) CPU: AMCC PowerPC 405EP Rev. B at 770 MHz (PLB=256 OPB=128 EBC=128) I2C boot EEPROM disabled Internal PCI arbiter enabled 16 KiB I-Cache 16 KiB D-Cache Board: Taihu - AMCC PPC405EP Evaluation Board I2C: ready DRAM: qemu-system-ppc: memory.c:1693: memory_region_del_subregion: Assertion `subregion->container == mr' failed. Aborted (core dumped) Not sure if this ever worked in QEMU, maybe in the early 0.15 time, but that version of QEMU also does not compile easily anymore on modern systems. So I'm afraid, getting this into a workable shape again will take a lot of time. At least I'll stop my efforts here now. Do you have this u-boot binary somewhere just for others who want to try it? Regards, BALATON Zoltan
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On Tue, 5 Oct 2021, Cédric Le Goater wrote: On 10/5/21 08:18, Alexey Kardashevskiy wrote: On 05/10/2021 15:44, Christophe Leroy wrote: Le 05/10/2021 à 02:48, David Gibson a écrit : On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: On 01/10/2021 15.04, Christophe Leroy wrote: Le 01/10/2021 à 14:04, Thomas Huth a écrit : On 01/10/2021 13.12, Peter Maydell wrote: On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: Nevertheless, as long as nobody has a hint where to find that ppc405_rom.bin, I think both boards are pretty useless in QEMU (as far as I can see, they do not work without the bios at all, so it's also not possible to use a Linux image with the "-kernel" CLI option directly). It is at least in theory possible to run bare-metal code on either board, by passing either a pflash or a bios argument. True. I did some more research, and seems like there was once support for those boards in u-boot, but it got removed there a couple of years ago already: https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 But I agree that there seem to be no signs of anybody actually successfully using these boards for anything, so we should deprecate-and-delete them. Yes, let's mark them as deprecated now ... if someone still uses them and speaks up, we can still revert the deprecation again. I really would like to be able to use them to validate Linux Kernel changes, hence looking for that missing BIOS. If we remove ppc405 from QEMU, we won't be able to do any regression tests of Linux Kernel on those processors. If you/someone managed to compile an old version of u-boot for one of these two boards, so that we would finally have something for regression testing, we can of course also keep the boards in QEMU... I can see that it would be usefor for some cases, but unless someone volunteers to track down the necessary firmware and look after it, I think we do need to deprecate it - I certainly don't have the capacity to look into this. I will look at it, please allow me a few weeks though. Well, building it was not hard but now I'd like to know what board QEMU actually emulates, there are way too many codenames and PVRs. yes. We should try to reduce the list below. Deprecating embedded machines is one way. Why should we reduce that list? It's good to have different cpu options when one wants to test code for different PPC versions (maybe also in user mode) or just to have a quick list of these at one place. Regards, BALATON Zoltan
Re: [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote: > g_usleep() calls nanosleep() and that now seems to call clock_nanosleep() > syscall. Now these patches are making use of g_usleep(). So add > clock_nanosleep() to list of allowed syscalls. > > Signed-off-by: Vivek Goyal > --- > tools/virtiofsd/passthrough_seccomp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/virtiofsd/passthrough_seccomp.c > b/tools/virtiofsd/passthrough_seccomp.c > index cd24b40b78..03080806c0 100644 > --- a/tools/virtiofsd/passthrough_seccomp.c > +++ b/tools/virtiofsd/passthrough_seccomp.c > @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = { > SCMP_SYS(writev), > SCMP_SYS(umask), > SCMP_SYS(nanosleep), > +SCMP_SYS(clock_nanosleep), This patch can be dropped once sleep has been replaced by a condvar. signature.asc Description: PGP signature
Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote: > As of now we don't support fcntl(F_SETLKW) and if we see one, we return > -EOPNOTSUPP. > > Change that by accepting these requests and returning a reply > immediately asking caller to wait. Once lock is available, send a > notification to the waiter indicating lock is available. > > In response to lock request, we are returning error value as "1", which > signals to client to queue the lock request internally and later client > will get a notification which will signal lock is taken (or error). And > then fuse client should wake up the guest process. > > Signed-off-by: Vivek Goyal > Signed-off-by: Ioannis Angelakopoulos > --- > tools/virtiofsd/fuse_lowlevel.c | 37 - > tools/virtiofsd/fuse_lowlevel.h | 26 > tools/virtiofsd/fuse_virtio.c| 50 --- > tools/virtiofsd/passthrough_ll.c | 70 > 4 files changed, 167 insertions(+), 16 deletions(-) > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c > index e4679c73ab..2e7f4b786d 100644 > --- a/tools/virtiofsd/fuse_lowlevel.c > +++ b/tools/virtiofsd/fuse_lowlevel.c > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, > struct iovec *iov, > .unique = req->unique, > .error = error, > }; > - > -if (error <= -1000 || error > 0) { > +/* error = 1 has been used to signal client to wait for notificaiton */ > +if (error <= -1000 || error > 1) { > fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error); > out.error = -ERANGE; > } > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err) > return send_reply(req, -err, NULL, 0); > } > > +int fuse_reply_wait(fuse_req_t req) > +{ > +return send_reply(req, 1, NULL, 0); > +} > + > void fuse_reply_none(fuse_req_t req) > { > fuse_free_req(req); > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t > nodeid, > send_reply_ok(req, NULL, 0); > } > > +static int send_notify_iov(struct fuse_session *se, int notify_code, > + struct iovec *iov, int count) > +{ > +struct fuse_out_header out; > +if (!se->got_init) { > +return -ENOTCONN; > +} > +out.unique = 0; > +out.error = notify_code; > +iov[0].iov_base = &out; > +iov[0].iov_len = sizeof(struct fuse_out_header); > +return fuse_send_msg(se, NULL, iov, count); > +} > + > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique, > + int32_t error) > +{ > +struct fuse_notify_lock_out outarg = {0}; > +struct iovec iov[2]; > + > +outarg.unique = unique; > +outarg.error = -error; > + > +iov[1].iov_base = &outarg; > +iov[1].iov_len = sizeof(outarg); > +return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2); > +} > + > int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino, > off_t offset, struct fuse_bufvec *bufv) > { > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h > index c55c0ca2fc..64624b48dc 100644 > --- a/tools/virtiofsd/fuse_lowlevel.h > +++ b/tools/virtiofsd/fuse_lowlevel.h > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops { > */ > int fuse_reply_err(fuse_req_t req, int err); > > +/** > + * Ask caller to wait for lock. > + * > + * Possible requests: > + * setlkw > + * > + * If caller sends a blocking lock request (setlkw), then reply to caller > + * that wait for lock to be available. Once lock is available caller will I can't parse the first sentence. s/that wait for lock to be available/that waiting for the lock is necessary/? > + * receive a notification with request's unique id. Notification will > + * carry info whether lock was successfully obtained or not. > + * > + * @param req request handle > + * @return zero for success, -errno for failure to send reply > + */ > +int fuse_reply_wait(fuse_req_t req); > + > /** > * Don't send reply > * > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session > *se, fuse_ino_t parent, > int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino, > off_t offset, struct fuse_bufvec *bufv); > > +/** > + * Notify event related to previous lock request > + * > + * @param se the session object > + * @param unique the unique id of the request which requested setlkw > + * @param error zero for success, -errno for the failure > + */ > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique, > + int32_t error); > + > /* > * Utility functions > */ > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c > index a87e88e286..bb2d4456fc 100644 > --- a/tools/virtiofsd/fuse_virtio.c > +++ b/tools/virtiofsd/fuse_virtio.c > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv
Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)
On 05/10/2021 14.20, BALATON Zoltan wrote: On Tue, 5 Oct 2021, Cédric Le Goater wrote: On 10/5/21 08:18, Alexey Kardashevskiy wrote: On 05/10/2021 15:44, Christophe Leroy wrote: Le 05/10/2021 à 02:48, David Gibson a écrit : On Fri, Oct 01, 2021 at 04:18:49PM +0200, Thomas Huth wrote: On 01/10/2021 15.04, Christophe Leroy wrote: Le 01/10/2021 à 14:04, Thomas Huth a écrit : On 01/10/2021 13.12, Peter Maydell wrote: On Fri, 1 Oct 2021 at 10:43, Thomas Huth wrote: Nevertheless, as long as nobody has a hint where to find that ppc405_rom.bin, I think both boards are pretty useless in QEMU (as far as I can see, they do not work without the bios at all, so it's also not possible to use a Linux image with the "-kernel" CLI option directly). It is at least in theory possible to run bare-metal code on either board, by passing either a pflash or a bios argument. True. I did some more research, and seems like there was once support for those boards in u-boot, but it got removed there a couple of years ago already: https://gitlab.com/qemu-project/u-boot/-/commit/98f705c9cefdf https://gitlab.com/qemu-project/u-boot/-/commit/b147ff2f37d5b https://gitlab.com/qemu-project/u-boot/-/commit/7514037bcdc37 But I agree that there seem to be no signs of anybody actually successfully using these boards for anything, so we should deprecate-and-delete them. Yes, let's mark them as deprecated now ... if someone still uses them and speaks up, we can still revert the deprecation again. I really would like to be able to use them to validate Linux Kernel changes, hence looking for that missing BIOS. If we remove ppc405 from QEMU, we won't be able to do any regression tests of Linux Kernel on those processors. If you/someone managed to compile an old version of u-boot for one of these two boards, so that we would finally have something for regression testing, we can of course also keep the boards in QEMU... I can see that it would be usefor for some cases, but unless someone volunteers to track down the necessary firmware and look after it, I think we do need to deprecate it - I certainly don't have the capacity to look into this. I will look at it, please allow me a few weeks though. Well, building it was not hard but now I'd like to know what board QEMU actually emulates, there are way too many codenames and PVRs. yes. We should try to reduce the list below. Deprecating embedded machines is one way. Why should we reduce that list? It's good to have different cpu options when one wants to test code for different PPC versions (maybe also in user mode) or just to have a quick list of these at one place. I think there are many CPUs in that list which cannot be used with any board, some of them might be also in a very incomplete state. So presenting such a big list to the users is confusing and might create wrong expectations. It would be good to remove at least the CPUs which are really completely useless. Thomas
Re: Deprecate the ppc405 boards in QEMU?
On 05/10/2021 14.17, BALATON Zoltan wrote: On Tue, 5 Oct 2021, Thomas Huth wrote: On 05/10/2021 10.07, Thomas Huth wrote: On 05/10/2021 10.05, Alexey Kardashevskiy wrote: [...] What is so special about taihu? taihu is the other 405 board defined in hw/ppc/ppc405_boards.c (which I suggested to deprecate now) I've now also played with the u-boot sources a little bit, and with some bit of tweaking, it's indeed possible to compile the old taihu board there. However, it does not really work with QEMU anymore, it immediately triggers an assert(): $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio ** ERROR:accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: (qemu_mutex_iothread_locked()) Aborted (core dumped) Maybe it's similar to this: 2025fc6766ab25501e0041c564c44bb0f7389774 The helper_load_dcr() and helper_store_dcr() in target/ppc/timebase_helper.c seem to lock/unlock the iothread but I'm not sure if that's necessary. Also not sure why this does not happen with 460ex but that maybe uses different code. It's rather the other way round, the locking is missing here instead. I can get the serial output with the current QEMU when I add the following patch (not sure whether that's the right spot, though): diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c index f5d012f860..bb57f1c9ed 100644 --- a/hw/ppc/ppc.c +++ b/hw/ppc/ppc.c @@ -336,6 +336,8 @@ void store_40x_dbcr0(CPUPPCState *env, uint32_t val) { PowerPCCPU *cpu = env_archcpu(env); +qemu_mutex_lock_iothread(); + switch ((val >> 28) & 0x3) { case 0x0: /* No action */ @@ -353,6 +355,8 @@ void store_40x_dbcr0(CPUPPCState *env, uint32_t val) ppc40x_system_reset(cpu); break; } + +qemu_mutex_unlock_iothread(); } /* PowerPC 40x internal IRQ controller */ Going back to QEMU v2.3.0, I can see at least a little bit of output, but it then also triggers an assert() during DRAM initialization: $ qemu-system-ppc -M taihu -bios u-boot.bin -serial null -serial mon:stdio Reset PowerPC core U-Boot 2014.10-rc2-00123-g461be2f96e-dirty (Oct 05 2021 - 10:02:56) CPU: AMCC PowerPC 405EP Rev. B at 770 MHz (PLB=256 OPB=128 EBC=128) I2C boot EEPROM disabled Internal PCI arbiter enabled 16 KiB I-Cache 16 KiB D-Cache Board: Taihu - AMCC PPC405EP Evaluation Board I2C: ready DRAM: qemu-system-ppc: memory.c:1693: memory_region_del_subregion: Assertion `subregion->container == mr' failed. Aborted (core dumped) Not sure if this ever worked in QEMU, maybe in the early 0.15 time, but that version of QEMU also does not compile easily anymore on modern systems. So I'm afraid, getting this into a workable shape again will take a lot of time. At least I'll stop my efforts here now. Do you have this u-boot binary somewhere just for others who want to try it? FWIW: http://people.redhat.com/~thuth/data/u-boot-taihu.bin Thomas
Re: [PATCH 08/13] virtiofsd: Create a notification queue
On Tue, Oct 05, 2021 at 09:14:14AM +0100, Stefan Hajnoczi wrote: > On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote: > > On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote: > > > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote: > > > > Add a notification queue which will be used to send async notifications > > > > for file lock availability. > > > > > > > > Signed-off-by: Vivek Goyal > > > > Signed-off-by: Ioannis Angelakopoulos > > > > --- > > > > hw/virtio/vhost-user-fs-pci.c | 4 +- > > > > hw/virtio/vhost-user-fs.c | 62 +-- > > > > include/hw/virtio/vhost-user-fs.h | 2 + > > > > tools/virtiofsd/fuse_i.h | 1 + > > > > tools/virtiofsd/fuse_virtio.c | 70 +++ > > > > 5 files changed, 116 insertions(+), 23 deletions(-) > > > > > > > > diff --git a/hw/virtio/vhost-user-fs-pci.c > > > > b/hw/virtio/vhost-user-fs-pci.c > > > > index 2ed8492b3f..cdb9471088 100644 > > > > --- a/hw/virtio/vhost-user-fs-pci.c > > > > +++ b/hw/virtio/vhost-user-fs-pci.c > > > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy > > > > *vpci_dev, Error **errp) > > > > DeviceState *vdev = DEVICE(&dev->vdev); > > > > > > > > if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) { > > > > -/* Also reserve config change and hiprio queue vectors */ > > > > -vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2; > > > > +/* Also reserve config change, hiprio and notification queue > > > > vectors */ > > > > +vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3; > > > > } > > > > > > > > qdev_realize(vdev, BUS(&vpci_dev->bus), errp); > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > > index d1efbc5b18..6bafcf0243 100644 > > > > --- a/hw/virtio/vhost-user-fs.c > > > > +++ b/hw/virtio/vhost-user-fs.c > > > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = { > > > > VIRTIO_F_NOTIFY_ON_EMPTY, > > > > VIRTIO_F_RING_PACKED, > > > > VIRTIO_F_IOMMU_PLATFORM, > > > > +VIRTIO_FS_F_NOTIFICATION, > > > > > > > > VHOST_INVALID_FEATURE_BIT > > > > }; > > > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, > > > > VirtQueue *vq) > > > > */ > > > > } > > > > > > > > -static void vuf_create_vqs(VirtIODevice *vdev) > > > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq) > > > > { > > > > VHostUserFS *fs = VHOST_USER_FS(vdev); > > > > unsigned int i; > > > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev) > > > > /* Hiprio queue */ > > > > fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, > > > > vuf_handle_output); > > > > +/* > > > > + * Notification queue. Feature negotiation happens later. So at > > > > this > > > > + * point of time we don't know if driver will use notification > > > > queue > > > > + * or not. > > > > + */ > > > > +if (notification_vq) { > > > > +fs->notification_vq = virtio_add_queue(vdev, > > > > fs->conf.queue_size, > > > > + vuf_handle_output); > > > > +} > > > > > > > > /* Request queues */ > > > > fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues); > > > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev) > > > >vuf_handle_output); > > > > } > > > > > > > > -/* 1 high prio queue, plus the number configured */ > > > > -fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues; > > > > +/* 1 high prio queue, 1 notification queue plus the number > > > > configured */ > > > > +if (notification_vq) { > > > > +fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues; > > > > +} else { > > > > +fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues; > > > > +} > > > > fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, > > > > fs->vhost_dev.nvqs); > > > > } > > > > > > > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev) > > > > virtio_delete_queue(fs->hiprio_vq); > > > > fs->hiprio_vq = NULL; > > > > > > > > +if (fs->notification_vq) { > > > > +virtio_delete_queue(fs->notification_vq); > > > > +} > > > > +fs->notification_vq = NULL; > > > > + > > > > for (i = 0; i < fs->conf.num_request_queues; i++) { > > > > virtio_delete_queue(fs->req_vqs[i]); > > > > } > > > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice > > > > *vdev, > > > > { > > > > VHostUserFS *fs = VHOST_USER_FS(vdev); > > > > > > > > +virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION); > > > > + > > > > return vhost_get_features(&fs->vhost_dev, user_feature_bits, > > > > features); > > > > } > > > > > > > > +static void vuf_set_fea
Re: [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable
On Mon, Oct 04, 2021 at 09:38:04PM +0200, Christian Schoenebeck wrote: > Refactor VIRTQUEUE_MAX_SIZE to effectively become a runtime > variable per virtio user. virtio user == virtio device model? > > Reasons: > > (1) VIRTQUEUE_MAX_SIZE should reflect the absolute theoretical > maximum queue size possible. Which is actually the maximum > queue size allowed by the virtio protocol. The appropriate > value for VIRTQUEUE_MAX_SIZE would therefore be 32768: > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006 > > Apparently VIRTQUEUE_MAX_SIZE was instead defined with a > more or less arbitrary value of 1024 in the past, which > limits the maximum transfer size with virtio to 4M > (more precise: 1024 * PAGE_SIZE, with the latter typically > being 4k). Being equal to IOV_MAX is a likely reason. Buffers with more iovecs than that cannot be passed to host system calls (sendmsg(2), pwritev(2), etc). > (2) Additionally the current value of 1024 poses a hidden limit, > invisible to guest, which causes a system hang with the > following QEMU error if guest tries to exceed it: > > virtio: too many write descriptors in indirect table I don't understand this point. 2.6.5 The Virtqueue Descriptor Table says: The number of descriptors in the table is defined by the queue size for this virtqueue: this is the maximum possible descriptor chain length. and 2.6.5.3.1 Driver Requirements: Indirect Descriptors says: A driver MUST NOT create a descriptor chain longer than the Queue Size of the device. Do you mean a broken/malicious guest driver that is violating the spec? That's not a hidden limit, it's defined by the spec. > (3) Unfortunately not all virtio users in QEMU would currently > work correctly with the new value of 32768. > > So let's turn this hard coded global value into a runtime > variable as a first step in this commit, configurable for each > virtio user by passing a corresponding value with virtio_init() > call. virtio_add_queue() already has an int queue_size argument, why isn't that enough to deal with the maximum queue size? There's probably a good reason for it, but please include it in the commit description. > > Signed-off-by: Christian Schoenebeck > --- > hw/9pfs/virtio-9p-device.c | 3 ++- > hw/block/vhost-user-blk.c | 2 +- > hw/block/virtio-blk.c | 3 ++- > hw/char/virtio-serial-bus.c| 2 +- > hw/display/virtio-gpu-base.c | 2 +- > hw/input/virtio-input.c| 2 +- > hw/net/virtio-net.c| 15 --- > hw/scsi/virtio-scsi.c | 2 +- > hw/virtio/vhost-user-fs.c | 2 +- > hw/virtio/vhost-user-i2c.c | 3 ++- > hw/virtio/vhost-vsock-common.c | 2 +- > hw/virtio/virtio-balloon.c | 4 ++-- > hw/virtio/virtio-crypto.c | 3 ++- > hw/virtio/virtio-iommu.c | 2 +- > hw/virtio/virtio-mem.c | 2 +- > hw/virtio/virtio-pmem.c| 2 +- > hw/virtio/virtio-rng.c | 2 +- > hw/virtio/virtio.c | 35 +++--- > include/hw/virtio/virtio.h | 5 - > 19 files changed, 57 insertions(+), 36 deletions(-) > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c > index 54ee93b71f..cd5d95dd51 100644 > --- a/hw/9pfs/virtio-9p-device.c > +++ b/hw/9pfs/virtio-9p-device.c > @@ -216,7 +216,8 @@ static void virtio_9p_device_realize(DeviceState *dev, > Error **errp) > } > > v->config_size = sizeof(struct virtio_9p_config) + strlen(s->fsconf.tag); > -virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size); > +virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size, > +VIRTQUEUE_MAX_SIZE); > v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output); > } > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index ba13cb87e5..336f56705c 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState > *dev, Error **errp) > } > > virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, > -sizeof(struct virtio_blk_config)); > +sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE); > > s->virtqs = g_new(VirtQueue *, s->num_queues); > for (i = 0; i < s->num_queues; i++) { > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index f139cd7cc9..9c0f46815c 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -1213,7 +1213,8 @@ static void virtio_blk_device_realize(DeviceState *dev, > Error **errp) > > virtio_blk_set_config_size(s, s->host_features); > > -virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size); > +virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size, > +VIRTQUEUE_MAX_SIZE); > > s->blk = conf->conf.blk; > s->rq = NULL; > diff --git a/hw/char/virti
Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote: > On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote: > > Add a new custom threadpool using posix threads that specifically > > service locking requests. > > > > In the case of a fcntl(SETLKW) request, if the guest is waiting > > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd > > unblocks the blocked threads by sending a signal to them and waking > > them up. > > > > The current threadpool (GThreadPool) is not adequate to service the > > locking requests that result in a thread blocking. That is because > > GLib does not provide an API to cancel the request while it is > > serviced by a thread. In addition, a user might be running virtiofsd > > without a threadpool (--thread-pool-size=0), thus a locking request > > that blocks, will block the main virtqueue thread that services requests > > from servicing any other requests. > > > > The only exception occurs when the lock is of type F_UNLCK. In this case > > the request is serviced by the main virtqueue thread or a GThreadPool > > thread to avoid a deadlock, when all the threads in the custom threadpool > > are blocked. > > > > Then virtiofsd proceeds to cleanup the state of the threads, release > > them back to the system and re-initialize. > > Is there another way to cancel SETLKW without resorting to a new thread > pool? Since this only matters when shutting down or restarting, can we > close all plock->fd file descriptors to kick the GThreadPool workers out > of fnctl()? I don't think that closing plock->fd will unblock fcntl(). SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg) { struct fd f = fdget_raw(fd); } IIUC, fdget_raw() will take a reference on associated "struct file" and after that rest of the code will work with that "struct file". static int do_lock_file_wait(struct file *filp, unsigned int cmd, struct file_lock *fl) { .. .. error = wait_event_interruptible(fl->fl_wait, list_empty(&fl->fl_blocked_member)); .. .. } And this shoudl break upon receiving signal. And man page says the same thing. F_OFD_SETLKW (struct flock *) As for F_OFD_SETLK, but if a conflicting lock is held on the file, then wait for that lock to be released. If a signal is caught while waiting, then the call is interrupted and (after the signal handler has returned) returns immediately (with re‐ turn value -1 and errno set to EINTR; see signal(7)). It would be nice if we don't have to implement our own custom threadpool just for locking. Would have been better if glib thread pool provided some facility for this. [..] > > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c > > index 3b720c5d4a..c67c2e0e7a 100644 > > --- a/tools/virtiofsd/fuse_virtio.c > > +++ b/tools/virtiofsd/fuse_virtio.c > > @@ -20,6 +20,7 @@ > > #include "fuse_misc.h" > > #include "fuse_opt.h" > > #include "fuse_virtio.h" > > +#include "tpool.h" > > > > #include > > #include > > @@ -612,6 +613,60 @@ out: > > free(req); > > } > > > > +/* > > + * If the request is a locking request, use a custom locking thread pool. > > + */ > > +static bool use_lock_tpool(gpointer data, gpointer user_data) > > +{ > > +struct fv_QueueInfo *qi = user_data; > > +struct fuse_session *se = qi->virtio_dev->se; > > +FVRequest *req = data; > > +VuVirtqElement *elem = &req->elem; > > +struct fuse_buf fbuf = {}; > > +struct fuse_in_header *inhp; > > +struct fuse_lk_in *lkinp; > > +size_t lk_req_len; > > +/* The 'out' part of the elem is from qemu */ > > +unsigned int out_num = elem->out_num; > > +struct iovec *out_sg = elem->out_sg; > > +size_t out_len = iov_size(out_sg, out_num); > > +bool use_custom_tpool = false; > > + > > +/* > > + * If notifications are not enabled, no point in using cusotm lock > > + * thread pool. > > + */ > > +if (!se->notify_enabled) { > > +return false; > > +} > > + > > +assert(se->bufsize > sizeof(struct fuse_in_header)); > > +lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in); > > + > > +if (out_len < lk_req_len) { > > +return false; > > +} > > + > > +fbuf.mem = g_malloc(se->bufsize); > > +copy_from_iov(&fbuf, out_num, out_sg, lk_req_len); > > This looks inefficient: for every FUSE request we now malloc se->bufsize > and then copy lk_req_len bytes, only to free the memory again. > > Is it possible to keep lk_req_len bytes on the stack instead? I guess it should be possible. se->bufsize is variable but lk_req_len is known at compile time. lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in); So we should be able to allocate this much space on stack and point fbuf.mem to it. char buf[
RE: [PATCH v3 9/9] vfio: defer to commit kvm irq routing when enable msi/msix
> -Original Message- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Saturday, October 2, 2021 7:05 AM > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) > > Cc: phi...@redhat.com; pbonz...@redhat.com; marcel.apfelb...@gmail.com; > m...@redhat.com; qemu-devel@nongnu.org; Gonglei (Arei) > ; chenjiashang > Subject: Re: [PATCH v3 9/9] vfio: defer to commit kvm irq routing when enable > msi/msix > > On Tue, 21 Sep 2021 07:02:02 +0800 > "Longpeng(Mike)" wrote: > > > In migration resume phase, all unmasked msix vectors need to be > > setup when load the VF state. However, the setup operation would > > s/load/loading/ > > > take longer if the VM has more VFs and each VF has more unmasked > > vectors. > > > > The hot spot is kvm_irqchip_commit_routes, it'll scan and update > > all irqfds that already assigned each invocation, so more vectors > > s/that/that are/ > > > means need more time to process them. > > > > vfio_pci_load_config > > vfio_msix_enable > > msix_set_vector_notifiers > > for (vector = 0; vector < dev->msix_entries_nr; vector++) { > > vfio_msix_vector_do_use > > vfio_add_kvm_msi_virq > > kvm_irqchip_commit_routes <-- expensive > > } > > > > We can reduce the cost by only commit once outside the loop. The > > s/commit/committing/ > OK, will fix in the next version, thanks. > > routes is cached in kvm_state, we commit them first and then bind > > s/is/are/ > OK. > > irqfd for each vector. > > > > The test VM has 128 vcpus and 8 VF (each one has 65 vectors), > > we measure the cost of the vfio_msix_enable for each VF, and > > we can see 90+% costs can be reduce. > > > > VF Count of irqfds[*] OriginalWith this patch > > > > 1st 658 2 > > 2nd 130 15 2 > > 3rd 195 22 2 > > 4th 260 24 3 > > 5th 325 36 2 > > 6th 390 44 3 > > 7th 455 51 3 > > 8th 520 58 4 > > Total 258ms 21ms > > > > [*] Count of irqfds > > How many irqfds that already assigned and need to process in this > > round. > > > > The optimition can be applied to msi type too. > > s/optimition/optimization/ > OK, thanks. > > > > Signed-off-by: Longpeng(Mike) > > --- > > hw/vfio/pci.c | 36 > > 1 file changed, 28 insertions(+), 8 deletions(-) > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > index 2de1cc5425..b26129bddf 100644 > > --- a/hw/vfio/pci.c > > +++ b/hw/vfio/pci.c > > @@ -513,11 +513,13 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, > unsigned int nr, > > * increase them as needed. > > */ > > if (vdev->nr_vectors < nr + 1) { > > -vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX); > > vdev->nr_vectors = nr + 1; > > -ret = vfio_enable_vectors(vdev, true); > > -if (ret) { > > -error_report("vfio: failed to enable vectors, %d", ret); > > +if (!vdev->defer_kvm_irq_routing) { > > +vfio_disable_irqindex(&vdev->vbasedev, > VFIO_PCI_MSIX_IRQ_INDEX); > > +ret = vfio_enable_vectors(vdev, true); > > +if (ret) { > > +error_report("vfio: failed to enable vectors, %d", ret); > > +} > > } > > } else { > > Error *err = NULL; > > @@ -579,8 +581,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, > unsigned int nr) > > } > > } > > > > -/* TODO: invoked when enclabe msi/msix vectors */ > > -static __attribute__((unused)) void vfio_commit_kvm_msi_virq(VFIOPCIDevice > *vdev) > > +static void vfio_commit_kvm_msi_virq(VFIOPCIDevice *vdev) > > { > > int i; > > VFIOMSIVector *vector; > > @@ -610,6 +611,9 @@ static __attribute__((unused)) void > vfio_commit_kvm_msi_virq(VFIOPCIDevice *vdev > > > > static void vfio_msix_enable(VFIOPCIDevice *vdev) > > { > > +PCIDevice *pdev = &vdev->pdev; > > +int ret; > > + > > vfio_disable_interrupts(vdev); > > > > vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries); > > @@ -632,11 +636,22 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev) > > vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL); > > vfio_msix_vector_release(&vdev->pdev, 0); > > > > -if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use, > > - vfio_msix_vector_release, NULL)) { > > A comment would be useful here, maybe something like: > > /* > * Setting vector notifiers triggers synchronous vector-use > * callbacks for each active vector. Deferring to commit the KVM > * routes once rather than per vector provides a substantial > * performance improvement. > */ > Will add in the nex
Re: [PATCH 2/3] vdpa: Add vhost_vdpa_section_end
On Tue, Oct 5, 2021 at 12:47 PM Michael S. Tsirkin wrote: > > On Tue, Oct 05, 2021 at 11:52:37AM +0200, Eugenio Perez Martin wrote: > > On Tue, Oct 5, 2021 at 10:15 AM Michael S. Tsirkin wrote: > > > > > > On Tue, Oct 05, 2021 at 10:01:30AM +0200, Eugenio Pérez wrote: > > > > Abstract this operation, that will be reused when validating the region > > > > against the iova range that the device supports. > > > > > > > > Signed-off-by: Eugenio Pérez > > > > > > Note that as defined end is actually 1 byte beyond end of section. > > > As such it can e.g. overflow if cast to u64. > > > So be careful to use int128 ops with it. > > > > You are right, but this is only the result of extracting "llend" > > calculation in its own function, since it is going to be used a third > > time in the next commit. This next commit contains a mistake because > > of this, as you pointed out. > > > > Since "last" would be a very misleading name, do you think we could > > give a better name / type to it? > > > > > Also - document? > > > > It will be documented with that ("It returns one byte beyond end of > > section" or similar) too. > > > > Thanks! > > that's how c++ containers work so maybe it's not too bad as long > as we document this carefully. > I tend to see it that way except when the name is "last", that I read as "last one addressable/valid", as discussed in the VHOST_VDPA_GET_IOVA_RANGE call mail thread. So end = past range, last = last one valid. It would be great to have something like void * / hwaddr, or c++ chrono time_point vs time_point, that moves to type system the verification of not mixing different range types. But this may be overthinking at this moment. Thanks! > > > > > > > --- > > > > hw/virtio/vhost-vdpa.c | 18 +++--- > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > > > index ea1aa71ad8..a1de6c7c9c 100644 > > > > --- a/hw/virtio/vhost-vdpa.c > > > > +++ b/hw/virtio/vhost-vdpa.c > > > > @@ -24,6 +24,15 @@ > > > > #include "trace.h" > > > > #include "qemu-common.h" > > > > > > > > +static Int128 vhost_vdpa_section_end(const MemoryRegionSection > > > > *section) > > > > +{ > > > > +Int128 llend = int128_make64(section->offset_within_address_space); > > > > +llend = int128_add(llend, section->size); > > > > +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > > + > > > > +return llend; > > > > +} > > > > + > > > > static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection > > > > *section) > > > > { > > > > return (!memory_region_is_ram(section->mr) && > > > > @@ -160,10 +169,7 @@ static void > > > > vhost_vdpa_listener_region_add(MemoryListener *listener, > > > > } > > > > > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > > > -llend = int128_make64(section->offset_within_address_space); > > > > -llend = int128_add(llend, section->size); > > > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > > - > > > > +llend = vhost_vdpa_section_end(section); > > > > if (int128_ge(int128_make64(iova), llend)) { > > > > return; > > > > } > > > > @@ -221,9 +227,7 @@ static void > > > > vhost_vdpa_listener_region_del(MemoryListener *listener, > > > > } > > > > > > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > > > -llend = int128_make64(section->offset_within_address_space); > > > > -llend = int128_add(llend, section->size); > > > > -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > > > > +llend = vhost_vdpa_section_end(section); > > > > > > > > trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); > > > > > > > > -- > > > > 2.27.0 > > > >
Re: [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable
On Dienstag, 5. Oktober 2021 14:45:56 CEST Stefan Hajnoczi wrote: > On Mon, Oct 04, 2021 at 09:38:04PM +0200, Christian Schoenebeck wrote: > > Refactor VIRTQUEUE_MAX_SIZE to effectively become a runtime > > variable per virtio user. > > virtio user == virtio device model? Yes > > Reasons: > > > > (1) VIRTQUEUE_MAX_SIZE should reflect the absolute theoretical > > > > maximum queue size possible. Which is actually the maximum > > queue size allowed by the virtio protocol. The appropriate > > value for VIRTQUEUE_MAX_SIZE would therefore be 32768: > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.h > > tml#x1-240006 > > > > Apparently VIRTQUEUE_MAX_SIZE was instead defined with a > > more or less arbitrary value of 1024 in the past, which > > limits the maximum transfer size with virtio to 4M > > (more precise: 1024 * PAGE_SIZE, with the latter typically > > being 4k). > > Being equal to IOV_MAX is a likely reason. Buffers with more iovecs than > that cannot be passed to host system calls (sendmsg(2), pwritev(2), > etc). Yes, that's use case dependent. Hence the solution to opt-in if it is desired and feasible. > > (2) Additionally the current value of 1024 poses a hidden limit, > > > > invisible to guest, which causes a system hang with the > > following QEMU error if guest tries to exceed it: > > > > virtio: too many write descriptors in indirect table > > I don't understand this point. 2.6.5 The Virtqueue Descriptor Table says: > > The number of descriptors in the table is defined by the queue size for > this virtqueue: this is the maximum possible descriptor chain length. > > and 2.6.5.3.1 Driver Requirements: Indirect Descriptors says: > > A driver MUST NOT create a descriptor chain longer than the Queue Size of > the device. > > Do you mean a broken/malicious guest driver that is violating the spec? > That's not a hidden limit, it's defined by the spec. https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg00781.html https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg00788.html You can already go beyond that queue size at runtime with the indirection table. The only actual limit is the currently hard coded value of 1k pages. Hence the suggestion to turn that into a variable. > > (3) Unfortunately not all virtio users in QEMU would currently > > > > work correctly with the new value of 32768. > > > > So let's turn this hard coded global value into a runtime > > variable as a first step in this commit, configurable for each > > virtio user by passing a corresponding value with virtio_init() > > call. > > virtio_add_queue() already has an int queue_size argument, why isn't > that enough to deal with the maximum queue size? There's probably a good > reason for it, but please include it in the commit description. [...] > Can you make this value per-vq instead of per-vdev since virtqueues can > have different queue sizes? > > The same applies to the rest of this patch. Anything using > vdev->queue_max_size should probably use vq->vring.num instead. I would like to avoid that and keep it per device. The maximum size stored there is the maximum size supported by virtio user (or vortio device model, however you want to call it). So that's really a limit per device, not per queue, as no queue of the device would ever exceed that limit. Plus a lot more code would need to be refactored, which I think is unnecessary. Best regards, Christian Schoenebeck
[PATCH] docs: Add spec of OVMF GUIDed table for SEV guests
Add docs/specs/sev-guest-firmware.rst which describes the GUIDed table in the end of OVMF's image which is parsed by QEMU, and currently used to describe some values for SEV and SEV-ES guests. Signed-off-by: Dov Murik --- docs/specs/index.rst | 1 + docs/specs/sev-guest-firmware.rst | 125 ++ 2 files changed, 126 insertions(+) create mode 100644 docs/specs/sev-guest-firmware.rst diff --git a/docs/specs/index.rst b/docs/specs/index.rst index ecc43896bb..2a35700fb3 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -18,3 +18,4 @@ guest hardware that is specific to QEMU. acpi_mem_hotplug acpi_pci_hotplug acpi_nvdimm + sev-guest-firmware diff --git a/docs/specs/sev-guest-firmware.rst b/docs/specs/sev-guest-firmware.rst new file mode 100644 index 00..3f7f082df5 --- /dev/null +++ b/docs/specs/sev-guest-firmware.rst @@ -0,0 +1,125 @@ + +QEMU/Guest Firmware Interface for AMD SEV and SEV-ES + + +Overview + + +The guest firmware image (OVMF) may contain some configuration entries +which are used by QEMU before the guest launches. These are listed in a +GUIDed table at a known location in the firmware image. QEMU parses +this table when it loads the firmware image into memory, and then QEMU +reads individual entries when their values are needed. + +Though nothing in the table structure is SEV-specific, currently all the +entries in the table are related to SEV and SEV-ES features. + + +Table parsing in QEMU +- + +The table is parsed from the footer: first the presence of the table +footer GUID (96b582de-1fb2-45f7-baea-a366c55a082d) at 0xffd0 is +verified. If that is found, two bytes at 0xffce are the entire +table length. + +Then the table is scanned backwards looking for the specific entry GUID. + +QEMU files related to parsing and scanning the OVMF table: + - ``hw/i386/pc_sysfw_ovmf.c`` + +The edk2 firmware code that constructs this structure is in the +`OVMF Reset Vector file`_. + + +Table memory layout +--- + ++++-+ +|GPA | Length | Description | ++++=+ +| 0xff80 | 4 | Zero padding| ++++-+ +| 0xff84 | 4 | SEV hashes table base address | ++++-+ +| 0xff88 | 4 | SEV hashes table size (=0x400) | ++++-+ +| 0xff8c | 2 | SEV hashes table entry length (=0x1a) | ++++-+ +| 0xff8e | 16 | SEV hashes table GUID: | +||| 7255371f-3a3b-4b04-927b-1da6efa8d454| ++++-+ +| 0xff9e | 4 | SEV secret block base address | ++++-+ +| 0xffa2 | 4 | SEV secret block size (=0xc00) | ++++-+ +| 0xffa6 | 2 | SEV secret block entry length (=0x1a) | ++++-+ +| 0xffa8 | 16 | SEV secret block GUID: | +||| 4c2eb361-7d9b-4cc3-8081-127c90d3d294| ++++-+ +| 0xffb8 | 4 | SEV-ES AP reset RIP | ++++-+ +| 0xffbc | 2 | SEV-ES reset block entry length (=0x16) | ++++-+ +| 0xffbe | 16 | SEV-ES reset block entry GUID: | +||| 00f771de-1a7e-4fcb-890e-68c77e2fb44e| ++++-+ +| 0xffce | 2 | Length of entire table including table | +||| footer GUID and length (=0x72) | ++++-+ +| 0xffd0 | 16 | OVMF GUIDed table footer GUID: | +||| 96b582de-1fb2-45f7-baea-a366c55a082d| ++++-+ +| 0xffe0 | 8 | Application processor entry point code | ++++-+ +| 0xffe8 | 8 | "\0\0\0\0VTF\0" | ++++-+ +| 0xfff0 | 16 | Reset vector code | ++++-+ + + +T
Re: [PATCH] monitor: Tidy up find_device_state()
On 9/16/21 13:17, Markus Armbruster wrote: Commit 6287d827d4 "monitor: allow device_del to accept QOM paths" extended find_device_state() to accept QOM paths in addition to qdev IDs. This added a checked conversion to TYPE_DEVICE at the end, which duplicates the check done for the qdev ID case earlier, except it sets a *different* error: GenericError "ID is not a hotpluggable device" when passed a QOM path, and DeviceNotFound "Device 'ID' not found" when passed a qdev ID. Fortunately, the latter won't happen as long as we add only devices to /machine/peripheral/. Earlier, commit b6cc36abb2 "qdev: device_del: Search for to be unplugged device in 'peripheral' container" rewrote the lookup by qdev ID to use QOM instead of qdev_find_recursive(), so it can handle buss-less devices. It does so by constructing an absolute QOM path. Works, but object_resolve_path_component() is easier. Switching to it also gets rid of the unclean duplication described above. While there, avoid converting to TYPE_DEVICE twice, first to check whether it's possible, and then for real. Signed-off-by: Markus Armbruster Reviewed-by: Damien Hedde --- softmmu/qdev-monitor.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c index a304754ab9..d1ab3c25fb 100644 --- a/softmmu/qdev-monitor.c +++ b/softmmu/qdev-monitor.c @@ -831,16 +831,12 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp) static DeviceState *find_device_state(const char *id, Error **errp) { Object *obj; +DeviceState *dev; if (id[0] == '/') { obj = object_resolve_path(id, NULL); } else { -char *root_path = object_get_canonical_path(qdev_get_peripheral()); -char *path = g_strdup_printf("%s/%s", root_path, id); - -g_free(root_path); -obj = object_resolve_path_type(path, TYPE_DEVICE, NULL); -g_free(path); +obj = object_resolve_path_component(qdev_get_peripheral(), id); } if (!obj) { @@ -849,12 +845,13 @@ static DeviceState *find_device_state(const char *id, Error **errp) return NULL; } -if (!object_dynamic_cast(obj, TYPE_DEVICE)) { +dev = (DeviceState *)object_dynamic_cast(obj, TYPE_DEVICE); +if (!dev) { error_setg(errp, "%s is not a hotpluggable device", id); return NULL; } -return DEVICE(obj); +return dev; } void qdev_unplug(DeviceState *dev, Error **errp)
Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
On Mon, Oct 04, 2021 at 04:07:04PM +0100, Stefan Hajnoczi wrote: > On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote: > > As of now we don't support fcntl(F_SETLKW) and if we see one, we return > > -EOPNOTSUPP. > > > > Change that by accepting these requests and returning a reply > > immediately asking caller to wait. Once lock is available, send a > > notification to the waiter indicating lock is available. > > > > In response to lock request, we are returning error value as "1", which > > signals to client to queue the lock request internally and later client > > will get a notification which will signal lock is taken (or error). And > > then fuse client should wake up the guest process. > > > > Signed-off-by: Vivek Goyal > > Signed-off-by: Ioannis Angelakopoulos > > --- > > tools/virtiofsd/fuse_lowlevel.c | 37 - > > tools/virtiofsd/fuse_lowlevel.h | 26 > > tools/virtiofsd/fuse_virtio.c| 50 --- > > tools/virtiofsd/passthrough_ll.c | 70 > > 4 files changed, 167 insertions(+), 16 deletions(-) > > > > diff --git a/tools/virtiofsd/fuse_lowlevel.c > > b/tools/virtiofsd/fuse_lowlevel.c > > index e4679c73ab..2e7f4b786d 100644 > > --- a/tools/virtiofsd/fuse_lowlevel.c > > +++ b/tools/virtiofsd/fuse_lowlevel.c > > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int > > error, struct iovec *iov, > > .unique = req->unique, > > .error = error, > > }; > > - > > -if (error <= -1000 || error > 0) { > > +/* error = 1 has been used to signal client to wait for notificaiton */ > > s/notificaiton/notification/ Will fix. I have made too many spelling mistakes. :-( > > > +if (error <= -1000 || error > 1) { > > fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error); > > out.error = -ERANGE; > > } > > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err) > > return send_reply(req, -err, NULL, 0); > > } > > > > +int fuse_reply_wait(fuse_req_t req) > > +{ > > +return send_reply(req, 1, NULL, 0); > > +} > > + > > void fuse_reply_none(fuse_req_t req) > > { > > fuse_free_req(req); > > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t > > nodeid, > > send_reply_ok(req, NULL, 0); > > } > > > > +static int send_notify_iov(struct fuse_session *se, int notify_code, > > + struct iovec *iov, int count) > > +{ > > +struct fuse_out_header out; > > +if (!se->got_init) { > > +return -ENOTCONN; > > +} > > +out.unique = 0; > > +out.error = notify_code; > > Please fully initialize all fuse_out_header fields so it's obvious that > there is no accidental information leak from virtiofsd to the guest: > > struct fuse_out_header out = { > .error = notify_code, > }; > > The host must not expose uninitialized memory to the guest (just like > the kernel vs userspace). fuse_send_msg() initializes out.len later, but > to be on the safe side I think we should be explicit here. Agreed. Its better to be explicit here and initialize fuse_out_header fully. Will do. Vivek
Re: Moving QEMU downloads to GitLab Releases?
On Mon, Oct 04, 2021 at 02:34:49PM -0500, Michael Roth wrote: > Quoting Stefan Hajnoczi (2021-10-04 04:01:22) > > On Fri, Oct 01, 2021 at 09:39:13AM +0200, Philippe Mathieu-Daudé wrote: > > > On 9/30/21 15:40, Stefan Hajnoczi wrote: > > > > Hi Mike, > > > > QEMU downloads are currently hosted on qemu.org's Apache web server. > > > > Paolo and I were discussing ways to reduce qemu.org network traffic to > > > > save money and eventually turn off the qemu.org server since there is no > > > > full-time sysadmin for it. I'd like to discuss moving QEMU downloads to > > > > GitLab Releases. > > > > > > > > Since you create and sign QEMU releases I wanted to see what you think > > > > about the idea. GitLab Releases has two ways of creating release assets: > > > > archiving a git tree and attaching arbitrary binaries. The > > > > scripts/make-release script fetches submodules and generates version > > > > files, so it may be necessary to treat QEMU tarballs as arbitrary > > > > binaries instead of simply letting GitLab create git tree archives: > > > > https://docs.gitlab.com/ee/user/project/releases/#use-a-generic-package-for-attaching-binaries > > > > > > > > Releases can be uploaded via the GitLab API from your local machine or > > > > deployed as a GitLab CI job. Uploading from your local machine would be > > > > the closest to the current workflow. > > > > > > > > In the long term we could have a CI job that automatically publishes > > > > QEMU releases when a new qemu.git tag is pushed. The release process > > > > could be fully automated so that manual steps are no longer necessary, > > > > although we'd have to trust GitLab with QEMU GPG signing keys. > > > > > > Before having to trust a SaaS for GPG signing, could this work? > > > > > > - make-release script should produce a reproducible tarball in a > > > gitlab job, along with a file containing the tarball hash. > > > > > > - Mike (or whoever is responsible of releases) keeps doing a local > > > manual build > > > > > > - Mike checks the local hash matches the Gitlab artifact hash > > > > > > - Mike signs its local build/hash and uses the GitLab API to upload > > > that .sig to job artifacts > > > > > > - we can have an extra manual job that checks the tarball.sig > > > (hash and pubkey) and on success deploys updating the download > > > page, adding the new release > > > > I wonder what Mike sees as the way forward. > > Hi Stefan, Philippe, > > In general I like the idea, since we could also have the CI do the full > gamut of testing against the binaries built from said tarball, so the > Release Person can just regenerate the tarball and provide a sig to > attest that it came from the proper sources. Currently I do make check > and make check-acceptance and a few other sanity checks, which I guess > would no longer be needed then. > > But I think the more immediate issue is where/how to host those > tarballs. Even moving all the ROMs/capstone out of the source tree still > results in an xz-compressed tarball size ~25MB, which is well above the > 10MB limit mentioned earlier. We could break it out into target-specific > tarballs, maybe further into softmmu/user variants, but that sounds > painful for both users and maintainers who need to deal with the > resulting source tree differences. > > What I'm wondering is whether we could just use the archive files > generated by gitlab when we tag our releases? E.g.: > > https://gitlab.com/qemu-project/qemu/-/archive/v6.1.0/qemu-v6.1.0.tar.bz2 > > If we paired that with an in-tree script similar to make-release for > users to download individual ROM sources/subprojects used for a particular > tagged release, would that be sufficient for GPL compliance and verifying > what sources the binaries were built from? Are there any other > considerations WRT ROMs/etc.? > > With something like that in place, Release Person could just do a git > checkout, verify the Maintainer's sig/tag (in case we don't necessarily > trust the git host), generate the tarball, verify the hash matches what > gitlab published (or verify/diff individual files if the bz/gz hashes > require a specific environment), then sign the gitlab tarball and add > the sig to qemu.org download page along with a link the gitlab-generated > tarball. > > We could also publish the Maintainer and Release Person public keys on > qemu.org download page so users can verify this as well using the same > process. > > Users that want the additional sources can then do it locally via > above-mentioned script, which would be part of the now-signed tarball > and so could be 'trusted' assuming the individual project hosts weren't > compromised (which is still an assumption that's needed with the current > process anyway). > > I guess the main question is who is using the ROM/BIOS sources in the > tarballs, and would this 2-step process work for those users? If there > are distros relying on them then maybe there are some more logistics
Re: [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
On Mon, Oct 04, 2021 at 04:01:02PM +0100, Stefan Hajnoczi wrote: > On Thu, Sep 30, 2021 at 11:30:35AM -0400, Vivek Goyal wrote: > > So far we did not have the notion of cross queue traffic. That is, we > > get request on a queue and send back response on same queue. So if a > > request be being processed and at the same time a stop queue request > > comes in, we wait for all pending requests to finish and then queue > > is stopped and associated data structure cleaned. > > > > But with notification queue, now it is possible that we get a locking > > request on request queue and send the notification back on a different > > queue (notificaiton queue). This means, we need to make sure that > > s/notificaiton/notification/ > > > notifiation queue has not already been shutdown or is not being > > s/notifiation/notification/ Will fix both. [..] > > /* Callback from libvhost-user on start or stop of a queue */ > > @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, > > bool started) > > * the queue thread doesn't block in virtio_send_msg(). > > */ > > vu_dispatch_unlock(vud); > > -fv_queue_cleanup_thread(vud, qidx); > > + > > +/* > > + * If queue 0 is being shutdown, treat it as if device is being > > + * shutdown and stop all queues. > > + */ > > Please expand this comment so it's clear why we do this. Ok, will do. I put the justification in commit message but it is a good idea to put it here as well. Vivek
[PATCH v2] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts
This device is part of a superio/ISA bridge chip and IRQs from it are routed to an ISA interrupt set by the Interrupt Line PCI config register. Change uhci_update_irq() to allow this and use it from vt82c686-uhci-pci. Signed-off-by: BALATON Zoltan Reviewed-by: Jiaxun Yang --- v2: Do it differently to confine isa reference to vt82c686-uhci-pci as hcd-uhci is also used on machines that don't have isa. Left Jiaxun's R-b there as he checked it's the same for VT82C686B and gave R-b for the approach which still holds but speak up if you tink otherwise. hw/usb/hcd-uhci.c | 15 +-- hw/usb/hcd-uhci.h | 8 +--- hw/usb/vt82c686-uhci-pci.c | 10 ++ 3 files changed, 24 insertions(+), 9 deletions(-) diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c index 0cb02a6432..7924cfffdb 100644 --- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -288,9 +288,14 @@ static UHCIAsync *uhci_async_find_td(UHCIState *s, uint32_t td_addr) return NULL; } +static void uhci_pci_set_irq(UHCIState *s, int level) +{ +pci_set_irq(&s->dev, level); +} + static void uhci_update_irq(UHCIState *s) { -int level; +int level = 0; if (((s->status2 & 1) && (s->intr & (1 << 2))) || ((s->status2 & 2) && (s->intr & (1 << 3))) || ((s->status & UHCI_STS_USBERR) && (s->intr & (1 << 0))) || @@ -298,10 +303,8 @@ static void uhci_update_irq(UHCIState *s) (s->status & UHCI_STS_HSERR) || (s->status & UHCI_STS_HCPERR)) { level = 1; -} else { -level = 0; } -pci_set_irq(&s->dev, level); +s->set_irq(s, level); } static void uhci_reset(DeviceState *dev) @@ -1170,9 +1173,9 @@ void usb_uhci_common_realize(PCIDevice *dev, Error **errp) pci_conf[PCI_CLASS_PROG] = 0x00; /* TODO: reset value should be 0. */ -pci_conf[USB_SBRN] = USB_RELEASE_1; // release number - +pci_conf[USB_SBRN] = USB_RELEASE_1; /* release number */ pci_config_set_interrupt_pin(pci_conf, u->info.irq_pin + 1); +s->set_irq = uhci_pci_set_irq; if (s->masterbus) { USBPort *ports[NB_PORTS]; diff --git a/hw/usb/hcd-uhci.h b/hw/usb/hcd-uhci.h index e61d8fcb19..ecd19762d6 100644 --- a/hw/usb/hcd-uhci.h +++ b/hw/usb/hcd-uhci.h @@ -42,7 +42,9 @@ typedef struct UHCIPort { uint16_t ctrl; } UHCIPort; -typedef struct UHCIState { +typedef struct UHCIState UHCIState; + +struct UHCIState { PCIDevice dev; MemoryRegion io_bar; USBBus bus; /* Note unused when we're a companion controller */ @@ -60,7 +62,7 @@ typedef struct UHCIState { uint32_t frame_bandwidth; bool completions_only; UHCIPort ports[NB_PORTS]; - +void (*set_irq)(UHCIState *s, int level); /* Interrupts that should be raised at the end of the current frame. */ uint32_t pending_int_mask; @@ -72,7 +74,7 @@ typedef struct UHCIState { char *masterbus; uint32_t firstport; uint32_t maxframes; -} UHCIState; +}; #define TYPE_UHCI "pci-uhci-usb" DECLARE_INSTANCE_CHECKER(UHCIState, UHCI, TYPE_UHCI) diff --git a/hw/usb/vt82c686-uhci-pci.c b/hw/usb/vt82c686-uhci-pci.c index b109c21603..f6bae704be 100644 --- a/hw/usb/vt82c686-uhci-pci.c +++ b/hw/usb/vt82c686-uhci-pci.c @@ -1,6 +1,15 @@ #include "qemu/osdep.h" +#include "hw/irq.h" #include "hcd-uhci.h" +static void uhci_isa_set_irq(UHCIState *s, int level) +{ +uint8_t irq = pci_get_byte(s->dev.config + PCI_INTERRUPT_LINE); +if (irq > 0 && irq < 15) { +qemu_set_irq(isa_get_irq(NULL, irq), level); +} +} + static void usb_uhci_vt82c686b_realize(PCIDevice *dev, Error **errp) { UHCIState *s = UHCI(dev); @@ -14,6 +23,7 @@ static void usb_uhci_vt82c686b_realize(PCIDevice *dev, Error **errp) pci_set_long(pci_conf + 0xc0, 0x2000); usb_uhci_common_realize(dev, errp); +s->set_irq = uhci_isa_set_irq; } static UHCIInfo uhci_info[] = { -- 2.21.4
Re: [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
On 2021-09-30 at 11:30 -04, Vivek Goyal wrote... > We are emulating posix locks for guest using open file description locks > in virtiofsd. When any of the fd is closed in guest, we find associated > OFD lock fd (if there is one) and close it to release all the locks. > > Assumption here is that there is no other thread using lo_inode_plock > structure or plock->fd, hence it is safe to do so. > > But now we are about to introduce blocking variant of locks (SETLKW), > and that means we might be waiting to a lock to be available and > using plock->fd. And that means there are still users of plock > structure. > > So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd > and plock will be freed later when lo_inode is being freed. > > Signed-off-by: Vivek Goyal > Signed-off-by: Ioannis Angelakopoulos > --- > tools/virtiofsd/passthrough_ll.c | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c > b/tools/virtiofsd/passthrough_ll.c > index 38b2af8599..6928662e22 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct > lo_inode *inode, uint64_t n) > lo_map_remove(&lo->ino_map, inode->fuse_ino); > g_hash_table_remove(lo->inodes, &inode->key); > if (lo->posix_lock) { > -if (g_hash_table_size(inode->posix_locks)) { > -fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n"); > -} > g_hash_table_destroy(inode->posix_locks); > pthread_mutex_destroy(&inode->plock_mutex); > } > @@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, > struct fuse_file_info *fi) > (void)ino; > struct lo_inode *inode; > struct lo_data *lo = lo_data(req); > +struct lo_inode_plock *plock; > +struct flock flock; > > inode = lo_inode(req, ino); > if (!inode) { > @@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, > struct fuse_file_info *fi) > /* An fd is going away. Cleanup associated posix locks */ > if (lo->posix_lock) { > pthread_mutex_lock(&inode->plock_mutex); > -g_hash_table_remove(inode->posix_locks, I'm curious why the g_hash_table_remove above is not in the 'if' below? > +plock = g_hash_table_lookup(inode->posix_locks, > GUINT_TO_POINTER(fi->lock_owner)); > + > +if (plock) { > +/* > + * An fd is being closed. For posix locks, this means > + * drop all the associated locks. > + */ > +memset(&flock, 0, sizeof(struct flock)); > +flock.l_type = F_UNLCK; > +flock.l_whence = SEEK_SET; > +/* Unlock whole file */ > +flock.l_start = flock.l_len = 0; > +fcntl(plock->fd, F_OFD_SETLK, &flock); > +} > + > pthread_mutex_unlock(&inode->plock_mutex); > } > res = close(dup(lo_fi_fd(req, fi))); -- Cheers, Christophe de Dinechin (IRC c3d)
[PATCH v2 1/3] vdpa: Skip protected ram IOMMU mappings
Following the logic of commit 56918a126ae ("memory: Add RAM_PROTECTED flag to skip IOMMU mappings") with VFIO, skip memory sections inaccessible via normal mechanisms, including DMA. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 47d7a5a23d..ea1aa71ad8 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -28,6 +28,7 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) { return (!memory_region_is_ram(section->mr) && !memory_region_is_iommu(section->mr)) || +memory_region_is_protected(section->mr) || /* vhost-vDPA doesn't allow MMIO to be mapped */ memory_region_is_ram_device(section->mr) || /* -- 2.27.0
Re: [PATCH v3 6/6] tests/qapi-schema: Test cases for aliases
Kevin Wolf writes: > Am 02.10.2021 um 15:33 hat Markus Armbruster geschrieben: >> I apologize for this wall of text. It's a desparate attempt to cut >> through the complexity and my confusion, and make sense of the actual >> problems we're trying to solve. >> >> So, what problems exactly are we trying to solve? > > I'll start with replying to your final question because I think it's > more helpful to start with the big picture than with details. > > So tools like libvirt want to have a single consistent interface to > configure things on startup and at runtime. We also intend to support > configuration files that should this time support all of the options and > not just a few chosen ones. Yes. > The hypothesis is that QAPIfying the command line is the correct > solution for both of these problems, i.e. all available command line > options must be present in the QAPI schema and will be processed by a > single parser shared with QMP to make sure they are consistent. Yes. This leads us to JSON option arguments and configuration files. Well-suited for management applications that already use QMP. > Adding QAPIfied versions of individual command line options are steps > towards this goal. As soon as they exist for every option, the final > conversion from an open coded getopt() loop (or in fact a hand crafted > parser in the case of vl.c) to something generated from the QAPI schema > should be reasonably easy. Yes. > You're right that adding a second JSON-based command line interface for > every option can already achieve the goal of providing a unified > external interface, at the cost of (interface and code) duplication. Is > this duplication desirable? Certainly no. Is it acceptable? You might > get different opinions on this one. We'd certainly prefer CLI options to match corresponding QMP commands exactly. Unfortunately, existing CLI options deviate from corresponding QMP commands, and existing CLI options without corresponding QMP commands may violate QMP design rules. Note: these issues pertain to human-friendly option syntax. The machine-friendly option syntax is still limited to just a few options, and it does match QMP there. > In my opinion, we should try to get rid of hand crafted parsers where > it's reasonably possible, and take advantage of the single unified > option structure that QAPI provides. -chardev specifically has a hand > crafted parser that essentially duplicates the automatically generated > QAPI visitor code, except for the nesting and some differences in option > names. We should definitely parse JSON option arguments with the QAPI machinery, and nothing more. Parsing human-friendly arguments with it is desirable, but the need for backward compatibility can make it difficult. Even where compatibility is of no concern, simply swapping concrete JSON syntax for dotted keys may result in human interfaces that are less than friendly. Are we in agreement that this is the problem at hand? > Aliases are one tool that can help bridge these differences in a generic > way with minimal effort in the individual case. They are not _necessary_ > to solve the problem; we could instead just use manually written code to > manipulate input QDicts so that QAPI visitors accept them. Even with > aliases, there are a few things left in the chardev code that are > converted this way. Aliases just greatly reduce the amount of this code > and make the conversion declarative instead. Understood. > Now a key point in the previous paragraph is that aliases add a generic > way to do this. So even if they are immediately motivated by -chardev, > it might be worth looking at other cases they could enable if you think > that -chardev alone isn't sufficient justification to have this tool. > I guess this is the point where things become a bit less clear because > people are just hand waving with vague ideas for additional uses. > > Do we need to invest more thought on these other cases? We probably do > if it makes a difference for the answer to the question whether we want > to add aliases to our toolbox. Does it? I hope we can make a case for aliases without looking beyond CLI QAPIfication. That's a wide field already, with enough opportunity to get lost in details. If we later put aliases to other uses, we might have to adapt them some. That's okay. Designing for one problem we have and understand has a much better chance of success than trying to design for all problems we might have. There are many CLI options to be QAPIfied. -chardev is one of the more thornier ones, which makes it a useful example. >> But what about the dotted keys argument? >> >> One point of view is the difference between the JSON and the dotted keys >> argument should be concrete syntax only. Fine print: there may be >> arguments dotted keys can't express, but let's ignore that here. >> >> Another point of view is that dotted keys arguments are to JSON >> arguments what HMP is to QMP: a (hopefully) human-f
[PATCH v2 0/3] vdpa: Check iova range on memory regions ops
At this moment vdpa will not send memory regions bigger than 1<<63. However, actual iova range could be way more restrictive than that. Since we can obtain the range through vdpa ioctl call, just save it from the beginning of the operation and check against it. Changes from v1: * Use of int128_gt instead of plain uint64_t < comparison on memory range end. * Document vhost_vdpa_section_end's return value so it's clear that it returns "one past end". Eugenio Pérez (3): vdpa: Skip protected ram IOMMU mappings vdpa: Add vhost_vdpa_section_end vdpa: Check for iova range at mappings changes include/hw/virtio/vhost-vdpa.h | 2 + hw/virtio/vhost-vdpa.c | 87 ++ hw/virtio/trace-events | 1 + 3 files changed, 69 insertions(+), 21 deletions(-) -- 2.27.0
[PATCH v2 3/3] vdpa: Check for iova range at mappings changes
Check vdpa device range before updating memory regions so we don't add any outside of it, and report the invalid change if any. Signed-off-by: Eugenio Pérez --- include/hw/virtio/vhost-vdpa.h | 2 + hw/virtio/vhost-vdpa.c | 68 ++ hw/virtio/trace-events | 1 + 3 files changed, 55 insertions(+), 16 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index a8963da2d9..c288cf7ecb 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -13,6 +13,7 @@ #define HW_VIRTIO_VHOST_VDPA_H #include "hw/virtio/virtio.h" +#include "standard-headers/linux/vhost_types.h" typedef struct VhostVDPAHostNotifier { MemoryRegion mr; @@ -24,6 +25,7 @@ typedef struct vhost_vdpa { uint32_t msg_type; bool iotlb_batch_begin_sent; MemoryListener listener; +struct vhost_vdpa_iova_range iova_range; struct vhost_dev *dev; VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; } VhostVDPA; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index be7c63b4ba..6654287050 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) return llend; } -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) -{ -return (!memory_region_is_ram(section->mr) && -!memory_region_is_iommu(section->mr)) || -memory_region_is_protected(section->mr) || - /* vhost-vDPA doesn't allow MMIO to be mapped */ -memory_region_is_ram_device(section->mr) || - /* -* Sizing an enabled 64-bit BAR can cause spurious mappings to -* addresses in the upper part of the 64-bit address space. These -* are never accessed by the CPU and beyond the address width of -* some IOMMU hardware. TODO: VDPA should tell us the IOMMU width. -*/ - section->offset_within_address_space & (1ULL << 63); +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section, +uint64_t iova_min, +uint64_t iova_max) +{ +Int128 llend; + +if ((!memory_region_is_ram(section->mr) && + !memory_region_is_iommu(section->mr)) || +memory_region_is_protected(section->mr) || +/* vhost-vDPA doesn't allow MMIO to be mapped */ +memory_region_is_ram_device(section->mr)) { +return true; +} + +if (section->offset_within_address_space < iova_min) { +error_report("RAM section out of device range (min=%lu, addr=%lu)", + iova_min, section->offset_within_address_space); +return true; +} + +llend = vhost_vdpa_section_end(section); +if (int128_gt(llend, int128_make64(iova_max))) { +error_report("RAM section out of device range (max=%lu, end addr=%lu)", + iova_max, int128_get64(llend)); +return true; +} + +return false; } static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, @@ -162,7 +176,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener, void *vaddr; int ret; -if (vhost_vdpa_listener_skipped_section(section)) { +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, +v->iova_range.last)) { return; } @@ -220,7 +235,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener, Int128 llend, llsize; int ret; -if (vhost_vdpa_listener_skipped_section(section)) { +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first, +v->iova_range.last)) { return; } @@ -288,9 +304,24 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status) vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s); } +static int vhost_vdpa_get_iova_range(struct vhost_vdpa *v) +{ +int ret; + +ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE, &v->iova_range); +if (ret != 0) { +return ret; +} + +trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first, +v->iova_range.last); +return ret; +} + static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) { struct vhost_vdpa *v; +int r; assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA); trace_vhost_vdpa_init(dev, opaque); @@ -300,6 +331,11 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) v->listener = vhost_vdpa_memory_listener; v->msg_type = VHOST_IOTLB_MSG_V2; +r = vhost_vdpa_get_iova_range(v); +if (unlikely(!r)) { +return r; +} + vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDG
[PATCH v2 2/3] vdpa: Add vhost_vdpa_section_end
Abstract this operation, that will be reused when validating the region against the iova range that the device supports. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-vdpa.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index ea1aa71ad8..be7c63b4ba 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -24,6 +24,19 @@ #include "trace.h" #include "qemu-common.h" +/* + * Return one past the end of the end of section. Be careful with uint64_t + * conversions! + */ +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section) +{ +Int128 llend = int128_make64(section->offset_within_address_space); +llend = int128_add(llend, section->size); +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + +return llend; +} + static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section) { return (!memory_region_is_ram(section->mr) && @@ -160,10 +173,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener, } iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); -llend = int128_make64(section->offset_within_address_space); -llend = int128_add(llend, section->size); -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); - +llend = vhost_vdpa_section_end(section); if (int128_ge(int128_make64(iova), llend)) { return; } @@ -221,9 +231,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener, } iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); -llend = int128_make64(section->offset_within_address_space); -llend = int128_add(llend, section->size); -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); +llend = vhost_vdpa_section_end(section); trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend)); -- 2.27.0
Re: [PATCH v0 0/2] virtio-blk and vhost-user-blk cross-device migration
* Michael S. Tsirkin (m...@redhat.com) wrote: > On Tue, Oct 05, 2021 at 02:18:40AM +0300, Roman Kagan wrote: > > On Mon, Oct 04, 2021 at 11:11:00AM -0400, Michael S. Tsirkin wrote: > > > On Mon, Oct 04, 2021 at 06:07:29PM +0300, Denis Plotnikov wrote: > > > > It might be useful for the cases when a slow block layer should be > > > > replaced > > > > with a more performant one on running VM without stopping, i.e. with > > > > very low > > > > downtime comparable with the one on migration. > > > > > > > > It's possible to achive that for two reasons: > > > > > > > > 1.The VMStates of "virtio-blk" and "vhost-user-blk" are almost the same. > > > > They consist of the identical VMSTATE_VIRTIO_DEVICE and differs from > > > > each other in the values of migration service fields only. > > > > 2.The device driver used in the guest is the same: virtio-blk > > > > > > > > In the series cross-migration is achieved by adding a new type. > > > > The new type uses virtio-blk VMState instead of vhost-user-blk specific > > > > VMstate, also it implements migration save/load callbacks to be > > > > compatible > > > > with migration stream produced by "virtio-blk" device. > > > > > > > > Adding the new type instead of modifying the existing one is convenent. > > > > It ease to differ the new virtio-blk-compatible vhost-user-blk > > > > device from the existing non-compatible one using qemu machinery > > > > without any > > > > other modifiactions. That gives all the variety of qemu device related > > > > constraints out of box. > > > > > > Hmm I'm not sure I understand. What is the advantage for the user? > > > What if vhost-user-blk became an alias for vhost-user-virtio-blk? > > > We could add some hacks to make it compatible for old machine types. > > > > The point is that virtio-blk and vhost-user-blk are not > > migration-compatible ATM. OTOH they are the same device from the guest > > POV so there's nothing fundamentally preventing the migration between > > the two. In particular, we see it as a means to switch between the > > storage backend transports via live migration without disrupting the > > guest. > > > > Migration-wise virtio-blk and vhost-user-blk have in common > > > > - the content of the VMState -- VMSTATE_VIRTIO_DEVICE > > > > The two differ in > > > > - the name and the version of the VMStateDescription > > > > - virtio-blk has an extra migration section (via .save/.load callbacks > > on VirtioDeviceClass) containing requests in flight > > > > It looks like to become migration-compatible with virtio-blk, > > vhost-user-blk has to start using VMStateDescription of virtio-blk and > > provide compatible .save/.load callbacks. It isn't entirely obvious how > > to make this machine-type-dependent, so we came up with a simpler idea > > of defining a new device that shares most of the implementation with the > > original vhost-user-blk except for the migration stuff. We're certainly > > open to suggestions on how to reconcile this under a single > > vhost-user-blk device, as this would be more user-friendly indeed. > > > > We considered using a class property for this and defining the > > respective compat clause, but IIUC the class constructors (where .vmsd > > and .save/.load are defined) are not supposed to depend on class > > properties. > > > > Thanks, > > Roman. > > So the question is how to make vmsd depend on machine type. > CC Eduardo who poked at this kind of compat stuff recently, > paolo who looked at qom things most recently and dgilbert > for advice on migration. I don't think I've seen anyone change vmsd name dependent on machine type; making fields appear/disappear is easy - that just ends up as a property on the device that's checked; I guess if that property is global (rather than per instance) then you can check it in vhost_user_blk_class_init and swing the dc->vmsd pointer? Dave > -- > MST > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK