Re: [PATCH 1/2] i2c: virtio: disable timeout handling
On 2021/10/20 14:41, Viresh Kumar wrote: On 20-10-21, 14:35, Jie Deng wrote: Yes, but we need to know what's the best value to be configured for a specific "other side". I think the "other side" should be more aware of what value is reasonable to be used. If we _really_ need that, then it would require an update to the specification first. I am not sure if the other side is the only party in play here. It depends on the entire setup and not just the hardware device. Specially with virtualisation things become really slow because of context switches, etc. It may be better for the guest userspace to decide on the value. Since this is specially for virtualisation, the kernel may set the value to few HZ by default, lets say 10 (Yeah its really high) :). I'm OK with this way for the simplicity. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote: If an untrusted device neogitates BLK_F_MQ but advertises a zero s/neogitates/negotiates num_queues, the driver may end up trying to allocating zero size buffers where ZERO_SIZE_PTR is returned which may pass the checking against the NULL. This will lead unexpected results. Fixing this by failing the probe in this case. Cc: Paolo Bonzini Cc: Stefan Hajnoczi Cc: Stefano Garzarella Signed-off-by: Jason Wang --- drivers/block/virtio_blk.c | 4 1 file changed, 4 insertions(+) Should we CC stable? Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote: > On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote: > > If an untrusted device neogitates BLK_F_MQ but advertises a zero > > s/neogitates/negotiates > > > num_queues, the driver may end up trying to allocating zero size > > buffers where ZERO_SIZE_PTR is returned which may pass the checking > > against the NULL. This will lead unexpected results. > > > > Fixing this by failing the probe in this case. > > > > Cc: Paolo Bonzini > > Cc: Stefan Hajnoczi > > Cc: Stefano Garzarella > > Signed-off-by: Jason Wang > > --- > > drivers/block/virtio_blk.c | 4 > > 1 file changed, 4 insertions(+) > > Should we CC stable? > > Reviewed-by: Stefano Garzarella No IMO - I don't think we can reasonably expect stable to become protected against attacks on encrypted guests. That's a new feature, not a bugfix. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote: > If an untrusted device neogitates BLK_F_MQ but advertises a zero > num_queues, the driver may end up trying to allocating zero size > buffers where ZERO_SIZE_PTR is returned which may pass the checking > against the NULL. This will lead unexpected results. > > Fixing this by failing the probe in this case. > > Cc: Paolo Bonzini > Cc: Stefan Hajnoczi > Cc: Stefano Garzarella > Signed-off-by: Jason Wang > --- > drivers/block/virtio_blk.c | 4 > 1 file changed, 4 insertions(+) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed
On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote: > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin > wrote: > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote: > > > When using indirect with packed, we don't check for allocation failures. > > > This patch checks that and fall back on direct. > > > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") > > > Signed-off-by: Xuan Zhuo > > > --- > > > drivers/virtio/virtio_ring.c | 13 ++--- > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > index 91a46c4da87d..44a03b6e4dc4 100644 > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct > > > vring_virtqueue *vq, > > > > > > head = vq->packed.next_avail_idx; > > > desc = alloc_indirect_packed(total_sg, gfp); > > > + if (!desc) > > > + /* fall back on direct */ > > > > this comment belongs in virtqueue_add_packed right after > > return. > > > > > + return -EAGAIN; > > > > > > > -ENOMEM surely? > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, so we > have to choose a different error code. > > Thanks. That's exactly the point. Why would we want to recover from allocation failure but not DMA map failure? > > > > > if (unlikely(vq->vq.num_free < 1)) { > > > pr_debug("Can't add buf len 1 - avail = 0\n"); > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct > > > virtqueue *_vq, > > > unsigned int i, n, c, descs_used, err_idx; > > > __le16 head_flags, flags; > > > u16 head, id, prev, curr, avail_used_flags; > > > + int err; > > > > > > START_USE(vq); > > > > > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct > > > virtqueue *_vq, > > > > > > BUG_ON(total_sg == 0); > > > > > > - if (virtqueue_use_indirect(_vq, total_sg)) > > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > - out_sgs, in_sgs, data, gfp); > > > + if (virtqueue_use_indirect(_vq, total_sg)) { > > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, > > > + in_sgs, data, gfp); > > > + if (err != -EAGAIN) > > > + return err; > > > + } > > > > > > head = vq->packed.next_avail_idx; > > > avail_used_flags = vq->packed.avail_used_flags; > > > -- > > > 2.31.0 > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
On Wed, Oct 20, 2021 at 03:37:31AM -0400, Michael S. Tsirkin wrote: On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote: On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote: > If an untrusted device neogitates BLK_F_MQ but advertises a zero s/neogitates/negotiates > num_queues, the driver may end up trying to allocating zero size > buffers where ZERO_SIZE_PTR is returned which may pass the checking > against the NULL. This will lead unexpected results. > > Fixing this by failing the probe in this case. > > Cc: Paolo Bonzini > Cc: Stefan Hajnoczi > Cc: Stefano Garzarella > Signed-off-by: Jason Wang > --- > drivers/block/virtio_blk.c | 4 > 1 file changed, 4 insertions(+) Should we CC stable? Reviewed-by: Stefano Garzarella No IMO - I don't think we can reasonably expect stable to become protected against attacks on encrypted guests. That's a new feature, not a bugfix. Yep, make sense. I had only seen the single patch, not the entire series, and it seemed like a fix. Viewed as a whole, it makes sense to consider it a new feature to improve audits in the guest. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On 2021/10/19 16:22, Viresh Kumar wrote: On 19-10-21, 09:46, Vincent Whitchurch wrote: static void virtio_i2c_msg_done(struct virtqueue *vq) { - struct virtio_i2c *vi = vq->vdev->priv; + struct virtio_i2c_req *req; + unsigned int len; - complete(&vi->completion); + while ((req = virtqueue_get_buf(vq, &len))) + complete(&req->completion); Instead of adding a completion for each request and using only the last one, maybe we can do this instead here: while ((req = virtqueue_get_buf(vq, &len))) { if (req->out_hdr.flags == cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT)) Is this for the last one check ? For the last one, this bit should be cleared, right ? complete(&vi->completion); } Since we already know which is the last one, we can also check at this point if buffers for all other requests are received or not. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 2/3] tty: hvc: pass DMA capable memory to put_chars()
A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Wed, Oct 20, 2021 at 04:47:23PM +0800, Xianting Tian wrote: > hi Greg, > > Could I get your comments of this new version patches? thanks It has been less than 5 days. Please relax, and only worry after 2 weeks have gone by. We have lots of patches to review. To help maintainers out, why don't you review other patches on the mailing lists as well while you wait? thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin wrote: > > On Wed, Oct 20, 2021 at 4:07 AM Jason Wang wrote: > > > > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang wrote: > > > > > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin > > > wrote: > > > > > > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang wrote: > > > > > > > > > > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道: > > > > > > Use translations added in VhostIOVATree in SVQ. > > > > > > > > > > > > Now every element needs to store the previous address also, so > > > > > > VirtQueue > > > > > > can consume the elements properly. This adds a little overhead per > > > > > > VQ > > > > > > element, having to allocate more memory to stash them. As a possible > > > > > > optimization, this allocation could be avoided if the descriptor is > > > > > > not > > > > > > a chain but a single one, but this is left undone. > > > > > > > > > > > > TODO: iova range should be queried before, and add logic to fail > > > > > > when > > > > > > GPA is outside of its range and memory listener or svq add it. > > > > > > > > > > > > Signed-off-by: Eugenio Pérez > > > > > > --- > > > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +- > > > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 > > > > > > - > > > > > > hw/virtio/vhost-vdpa.c | 40 - > > > > > > hw/virtio/trace-events | 1 + > > > > > > 4 files changed, 152 insertions(+), 23 deletions(-) > > > > > > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h > > > > > > b/hw/virtio/vhost-shadow-virtqueue.h > > > > > > index b7baa424a7..a0e6b5267a 100644 > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h > > > > > > @@ -11,6 +11,7 @@ > > > > > > #define VHOST_SHADOW_VIRTQUEUE_H > > > > > > > > > > > > #include "hw/virtio/vhost.h" > > > > > > +#include "hw/virtio/vhost-iova-tree.h" > > > > > > > > > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; > > > > > > > > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, > > > > > > unsigned idx, > > > > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx, > > > > > > VhostShadowVirtqueue *svq); > > > > > > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int > > > > > > idx); > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx, > > > > > > +VhostIOVATree *iova_map); > > > > > > > > > > > > void vhost_svq_free(VhostShadowVirtqueue *vq); > > > > > > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c > > > > > > b/hw/virtio/vhost-shadow-virtqueue.c > > > > > > index 2fd0bab75d..9db538547e 100644 > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c > > > > > > @@ -11,12 +11,19 @@ > > > > > > #include "hw/virtio/vhost-shadow-virtqueue.h" > > > > > > #include "hw/virtio/vhost.h" > > > > > > #include "hw/virtio/virtio-access.h" > > > > > > +#include "hw/virtio/vhost-iova-tree.h" > > > > > > > > > > > > #include "standard-headers/linux/vhost_types.h" > > > > > > > > > > > > #include "qemu/error-report.h" > > > > > > #include "qemu/main-loop.h" > > > > > > > > > > > > +typedef struct SVQElement { > > > > > > +VirtQueueElement elem; > > > > > > +void **in_sg_stash; > > > > > > +void **out_sg_stash; > > > > > > +} SVQElement; > > > > > > + > > > > > > /* Shadow virtqueue to relay notifications */ > > > > > > typedef struct VhostShadowVirtqueue { > > > > > > /* Shadow vring */ > > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue { > > > > > > /* Virtio device */ > > > > > > VirtIODevice *vdev; > > > > > > > > > > > > +/* IOVA mapping if used */ > > > > > > +VhostIOVATree *iova_map; > > > > > > + > > > > > > /* Map for returning guest's descriptors */ > > > > > > -VirtQueueElement **ring_id_maps; > > > > > > +SVQElement **ring_id_maps; > > > > > > > > > > > > /* Next head to expose to device */ > > > > > > uint16_t avail_idx_shadow; > > > > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t > > > > > > *dev_features) > > > > > > continue; > > > > > > > > > > > > case VIRTIO_F_ACCESS_PLATFORM: > > > > > > -/* SVQ needs this feature disabled. Can't continue */ > > > > > > -if (*dev_features & BIT_ULL(b)) { > > > > > > -clear_bit(b, dev_features); > > > > > > -r = false; > > > > > > -} > > > > > > -break; > > > > > > - > > > > > > case VIRTIO_F_VERSION_1: > > > > > > /* SVQ needs this feature, so can't continue */ > > > > > > if (!(*dev_features & BIT_ULL(b))) { > > > > > > @@ -126,6 +129,64 @@ static void > > > > > > vhost_svq_set_notification(VhostShadowVirtqueue *svq, b
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On 20-10-21, 16:54, Jie Deng wrote: > > On 2021/10/19 16:22, Viresh Kumar wrote: > > On 19-10-21, 09:46, Vincent Whitchurch wrote: > > > static void virtio_i2c_msg_done(struct virtqueue *vq) > > > { > > > - struct virtio_i2c *vi = vq->vdev->priv; > > > + struct virtio_i2c_req *req; > > > + unsigned int len; > > > - complete(&vi->completion); > > > + while ((req = virtqueue_get_buf(vq, &len))) > > > + complete(&req->completion); > > Instead of adding a completion for each request and using only the > > last one, maybe we can do this instead here: > > > > while ((req = virtqueue_get_buf(vq, &len))) { > > if (req->out_hdr.flags == > > cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT)) > > > Is this for the last one check ? For the last one, this bit should be > cleared, right ? Oops, you are right. This should be `!=` instead. Thanks. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V6] gpio: virtio: Add IRQ support
This patch adds IRQ support for the virtio GPIO driver. Note that this uses the irq_bus_lock/unlock() callbacks, since those operations over virtio may sleep. Reviewed-by: Linus Walleij Signed-off-by: Viresh Kumar --- Bartosz, The spec changes are close to merging now, I will let you know once the ballot is closed and the spec changes are merged. You can then pick this patch for 5.16. V5->V6: - Sent it separately as the other patches are already merged. - Queue the buffers only after enabling the irq (as per latest spec). - Migrate to raw_spinlock_t. drivers/gpio/Kconfig | 1 + drivers/gpio/gpio-virtio.c | 310 ++- include/uapi/linux/virtio_gpio.h | 25 +++ 3 files changed, 332 insertions(+), 4 deletions(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index fae5141251e5..bfa723ff8e7c 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -1674,6 +1674,7 @@ config GPIO_MOCKUP config GPIO_VIRTIO tristate "VirtIO GPIO support" depends on VIRTIO + select GPIOLIB_IRQCHIP help Say Y here to enable guest support for virtio-based GPIO controllers. diff --git a/drivers/gpio/gpio-virtio.c b/drivers/gpio/gpio-virtio.c index d24f1c9264bc..73fbe2eda4b9 100644 --- a/drivers/gpio/gpio-virtio.c +++ b/drivers/gpio/gpio-virtio.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -28,12 +29,30 @@ struct virtio_gpio_line { unsigned int rxlen; }; +struct vgpio_irq_line { + u8 type; + bool disabled; + bool masked; + bool queued; + bool update_pending; + bool queue_pending; + + struct virtio_gpio_irq_request ireq cacheline_aligned; + struct virtio_gpio_irq_response ires cacheline_aligned; +}; + struct virtio_gpio { struct virtio_device *vdev; struct mutex lock; /* Protects virtqueue operation */ struct gpio_chip gc; struct virtio_gpio_line *lines; struct virtqueue *request_vq; + + /* irq support */ + struct virtqueue *event_vq; + struct mutex irq_lock; /* Protects irq operation */ + raw_spinlock_t eventq_lock; /* Protects queuing of the buffer */ + struct vgpio_irq_line *irq_lines; }; static int _virtio_gpio_req(struct virtio_gpio *vgpio, u16 type, u16 gpio, @@ -186,6 +205,244 @@ static void virtio_gpio_set(struct gpio_chip *gc, unsigned int gpio, int value) virtio_gpio_req(vgpio, VIRTIO_GPIO_MSG_SET_VALUE, gpio, value, NULL); } +/* Interrupt handling */ +static void virtio_gpio_irq_prepare(struct virtio_gpio *vgpio, u16 gpio) +{ + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[gpio]; + struct virtio_gpio_irq_request *ireq = &irq_line->ireq; + struct virtio_gpio_irq_response *ires = &irq_line->ires; + struct scatterlist *sgs[2], req_sg, res_sg; + int ret; + + if (WARN_ON(irq_line->queued || irq_line->masked || irq_line->disabled)) + return; + + ireq->gpio = cpu_to_le16(gpio); + sg_init_one(&req_sg, ireq, sizeof(*ireq)); + sg_init_one(&res_sg, ires, sizeof(*ires)); + sgs[0] = &req_sg; + sgs[1] = &res_sg; + + ret = virtqueue_add_sgs(vgpio->event_vq, sgs, 1, 1, irq_line, GFP_ATOMIC); + if (ret) { + dev_err(&vgpio->vdev->dev, "failed to add request to eventq\n"); + return; + } + + irq_line->queued = true; + virtqueue_kick(vgpio->event_vq); +} + +static void virtio_gpio_irq_enable(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct virtio_gpio *vgpio = gpiochip_get_data(gc); + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; + + raw_spin_lock(&vgpio->eventq_lock); + irq_line->disabled = false; + irq_line->masked = false; + irq_line->queue_pending = true; + raw_spin_unlock(&vgpio->eventq_lock); + + irq_line->update_pending = true; +} + +static void virtio_gpio_irq_disable(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct virtio_gpio *vgpio = gpiochip_get_data(gc); + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; + + raw_spin_lock(&vgpio->eventq_lock); + irq_line->disabled = true; + irq_line->masked = true; + irq_line->queue_pending = false; + raw_spin_unlock(&vgpio->eventq_lock); + + irq_line->update_pending = true; +} + +static void virtio_gpio_irq_mask(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct virtio_gpio *vgpio = gpiochip_get_data(gc); + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; + + raw_spin_lock(&vgpio->eventq_lock); + irq_line->masked = true; + raw_spin_unlock(&vgpio->eventq_lock); +} + +static void virtio_gpio_irq_unmask(struct irq_data *d) +{ + struct gpio
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On Wed, Oct 20, 2021 at 11:17:21AM +0200, Viresh Kumar wrote: > On 20-10-21, 16:54, Jie Deng wrote: > > > > On 2021/10/19 16:22, Viresh Kumar wrote: > > > On 19-10-21, 09:46, Vincent Whitchurch wrote: > > > > static void virtio_i2c_msg_done(struct virtqueue *vq) > > > > { > > > > - struct virtio_i2c *vi = vq->vdev->priv; > > > > + struct virtio_i2c_req *req; > > > > + unsigned int len; > > > > - complete(&vi->completion); > > > > + while ((req = virtqueue_get_buf(vq, &len))) > > > > + complete(&req->completion); > > > Instead of adding a completion for each request and using only the > > > last one, maybe we can do this instead here: > > > > > > while ((req = virtqueue_get_buf(vq, &len))) { > > > if (req->out_hdr.flags == > > > cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT)) > > > > > > Is this for the last one check ? For the last one, this bit should be > > cleared, right ? > > Oops, you are right. This should be `!=` instead. Thanks. I don't quite understand how that would be safe since virtqueue_add_sgs() can fail after a few iterations and all queued request buffers can have FAIL_NEXT set. In such a case, we would end up waiting forever with your proposed change, wouldn't we? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On 20-10-21, 12:38, Vincent Whitchurch wrote: > I don't quite understand how that would be safe since > virtqueue_add_sgs() can fail after a few iterations and all queued > request buffers can have FAIL_NEXT set. In such a case, we would end up > waiting forever with your proposed change, wouldn't we? Good point. I didn't think of that earlier. I think a good simple way of handling this is counting the number of buffers sent and received. Once they match, we are done. That shouldn't break anything else I believe. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] i2c: virtio: disable timeout handling
On Wed, Oct 20, 2021 at 09:04:45AM +0200, Jie Deng wrote: > On 2021/10/20 14:41, Viresh Kumar wrote: > > On 20-10-21, 14:35, Jie Deng wrote: > >> Yes, but we need to know what's the best value to be configured for a > >> specific "other side". > >> > >> I think the "other side" should be more aware of what value is reasonable > >> to > >> be used. > > If we _really_ need that, then it would require an update to the > > specification first. > > > > I am not sure if the other side is the only party in play here. It > > depends on the entire setup and not just the hardware device. > > Specially with virtualisation things become really slow because of > > context switches, etc. It may be better for the guest userspace to > > decide on the value. > > > > Since this is specially for virtualisation, the kernel may set the > > value to few HZ by default, lets say 10 (Yeah its really high) :). > > I'm OK with this way for the simplicity. That would not be safe. Userspace can change this timeout and the end result with the current implementation of the driver is potentially kernel memory corruption, which is something userspace of course never should be able to trigger. Even if the timeout were hardcoded in the driver and the driver made to ignore what userspace requests, it would need to be set to a ridiculously high value so that we can guarantee that the timeout never ever occurs (since the result is potentially random kernel memory corruption). Which is basically equivalent to disabling the timeout entirely as my patch does. If the timeout cannot be disabled, then the driver should be fixed to always copy buffers and hold on to them to avoid memory corruption in the case of timeout, as I mentioned in my commit message. That would be quite a substantial change to the driver so it's not something I'm personally comfortable with doing, especially not this late in the -rc cycle, so I'd leave that to others. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] i2c: virtio: disable timeout handling
On 20-10-21, 12:55, Vincent Whitchurch wrote: > If the timeout cannot be disabled, then the driver should be fixed to > always copy buffers and hold on to them to avoid memory corruption in > the case of timeout, as I mentioned in my commit message. That would be > quite a substantial change to the driver so it's not something I'm > personally comfortable with doing, especially not this late in the -rc > cycle, so I'd leave that to others. Or we can avoid clearing up and freeing the buffers here until the point where the buffers are returned by the host. Until that happens, we can avoid taking new requests but return to the earlier caller with timeout failure. That would avoid corruption, by freeing buffers sooner, and not hanging of the kernel. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 0/2] virtio_ring: check desc == NULL when using indirect with packed
Fix theoretical issues in virtio_ring. v5: Update commit message. v4: Inside the #2 patch, virtqueue_add_indirect_packed() return -EAGAIN when desc == NULL. v3: Update commit message of the #1 patch. v2: Separate the style fix into a single patch. Xuan Zhuo (2): virtio_ring: make virtqueue_add_indirect_packed prettier virtio_ring: check desc == NULL when using indirect with packed drivers/virtio/virtio_ring.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 1/2] virtio_ring: make virtqueue_add_indirect_packed prettier
Align the arguments of virtqueue_add_indirect_packed() to the open ( to make it look prettier. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index dd95dfd85e98..91a46c4da87d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1050,12 +1050,12 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg, } static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, - struct scatterlist *sgs[], - unsigned int total_sg, - unsigned int out_sgs, - unsigned int in_sgs, - void *data, - gfp_t gfp) +struct scatterlist *sgs[], +unsigned int total_sg, +unsigned int out_sgs, +unsigned int in_sgs, +void *data, +gfp_t gfp) { struct vring_packed_desc *desc; struct scatterlist *sg; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 2/2] virtio_ring: check desc == NULL when using indirect with packed
When using indirect with packed, we don't check for allocation failures. This patch checks that and fall back on direct. Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 91a46c4da87d..552055157b21 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, head = vq->packed.next_avail_idx; desc = alloc_indirect_packed(total_sg, gfp); + if (!desc) + return -EAGAIN; if (unlikely(vq->vq.num_free < 1)) { pr_debug("Can't add buf len 1 - avail = 0\n"); @@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, unsigned int i, n, c, descs_used, err_idx; __le16 head_flags, flags; u16 head, id, prev, curr, avail_used_flags; + int err; START_USE(vq); @@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, BUG_ON(total_sg == 0); - if (virtqueue_use_indirect(_vq, total_sg)) - return virtqueue_add_indirect_packed(vq, sgs, total_sg, - out_sgs, in_sgs, data, gfp); + if (virtqueue_use_indirect(_vq, total_sg)) { + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, + in_sgs, data, gfp); + if (err != -EAGAIN) + return err; + + /* fall back on direct */ + } head = vq->packed.next_avail_idx; avail_used_flags = vq->packed.avail_used_flags; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 2/2] virtio_ring: check desc == NULL when using indirect with packed
On Wed, Oct 20, 2021 at 07:05:51PM +0800, Xuan Zhuo wrote: > When using indirect with packed, we don't check for allocation failures. > This patch checks that and fall back on direct. > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") > Signed-off-by: Xuan Zhuo > --- > drivers/virtio/virtio_ring.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 91a46c4da87d..552055157b21 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct > vring_virtqueue *vq, > > head = vq->packed.next_avail_idx; > desc = alloc_indirect_packed(total_sg, gfp); > + if (!desc) > + return -EAGAIN; > > if (unlikely(vq->vq.num_free < 1)) { > pr_debug("Can't add buf len 1 - avail = 0\n"); I still think ENOMEM is better, this way we handle dma map failures in the same way. > @@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue > *_vq, > unsigned int i, n, c, descs_used, err_idx; > __le16 head_flags, flags; > u16 head, id, prev, curr, avail_used_flags; > + int err; > > START_USE(vq); > > @@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct > virtqueue *_vq, > > BUG_ON(total_sg == 0); > > - if (virtqueue_use_indirect(_vq, total_sg)) > - return virtqueue_add_indirect_packed(vq, sgs, total_sg, > - out_sgs, in_sgs, data, gfp); > + if (virtqueue_use_indirect(_vq, total_sg)) { > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, > + in_sgs, data, gfp); > + if (err != -EAGAIN) > + return err; > + > + /* fall back on direct */ > + } > > head = vq->packed.next_avail_idx; > avail_used_flags = vq->packed.avail_used_flags; > -- > 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed
On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin wrote: > On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote: > > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin > > wrote: > > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote: > > > > When using indirect with packed, we don't check for allocation failures. > > > > This patch checks that and fall back on direct. > > > > > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") > > > > Signed-off-by: Xuan Zhuo > > > > --- > > > > drivers/virtio/virtio_ring.c | 13 ++--- > > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > > index 91a46c4da87d..44a03b6e4dc4 100644 > > > > --- a/drivers/virtio/virtio_ring.c > > > > +++ b/drivers/virtio/virtio_ring.c > > > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct > > > > vring_virtqueue *vq, > > > > > > > > head = vq->packed.next_avail_idx; > > > > desc = alloc_indirect_packed(total_sg, gfp); > > > > + if (!desc) > > > > + /* fall back on direct */ > > > > > > this comment belongs in virtqueue_add_packed right after > > > return. > > > > > > > + return -EAGAIN; > > > > > > > > > > -ENOMEM surely? > > > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, so > > we > > have to choose a different error code. > > > > Thanks. > > That's exactly the point. Why would we want to recover from allocation > failure but not DMA map failure? >From indirect to direct mode, there is no need to allocate memory, so if we encounter memory allocation failure, we can fall back from indirect to direct mode. Memory allocation failure is a temporary problem. And if you encounter a dma mmap error, this situation should be notified to the upper layer. Thanks. > > > > > > > > if (unlikely(vq->vq.num_free < 1)) { > > > > pr_debug("Can't add buf len 1 - avail = 0\n"); > > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct > > > > virtqueue *_vq, > > > > unsigned int i, n, c, descs_used, err_idx; > > > > __le16 head_flags, flags; > > > > u16 head, id, prev, curr, avail_used_flags; > > > > + int err; > > > > > > > > START_USE(vq); > > > > > > > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct > > > > virtqueue *_vq, > > > > > > > > BUG_ON(total_sg == 0); > > > > > > > > - if (virtqueue_use_indirect(_vq, total_sg)) > > > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > - out_sgs, in_sgs, data, gfp); > > > > + if (virtqueue_use_indirect(_vq, total_sg)) { > > > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > out_sgs, > > > > + in_sgs, data, gfp); > > > > + if (err != -EAGAIN) > > > > + return err; > > > > + } > > > > > > > > head = vq->packed.next_avail_idx; > > > > avail_used_flags = vq->packed.avail_used_flags; > > > > -- > > > > 2.31.0 > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 2/2] virtio_ring: check desc == NULL when using indirect with packed
When using indirect with packed, we don't check for allocation failures. This patch checks that and fall back on direct. Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 91a46c4da87d..bc92a2faa28f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, head = vq->packed.next_avail_idx; desc = alloc_indirect_packed(total_sg, gfp); + if (!desc) + return -ENOMEM; if (unlikely(vq->vq.num_free < 1)) { pr_debug("Can't add buf len 1 - avail = 0\n"); @@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, unsigned int i, n, c, descs_used, err_idx; __le16 head_flags, flags; u16 head, id, prev, curr, avail_used_flags; + int err; START_USE(vq); @@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, BUG_ON(total_sg == 0); - if (virtqueue_use_indirect(_vq, total_sg)) - return virtqueue_add_indirect_packed(vq, sgs, total_sg, - out_sgs, in_sgs, data, gfp); + if (virtqueue_use_indirect(_vq, total_sg)) { + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, + in_sgs, data, gfp); + if (err != -ENOMEM) + return err; + + /* fall back on direct */ + } head = vq->packed.next_avail_idx; avail_used_flags = vq->packed.avail_used_flags; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 1/2] virtio_ring: make virtqueue_add_indirect_packed prettier
Align the arguments of virtqueue_add_indirect_packed() to the open ( to make it look prettier. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index dd95dfd85e98..91a46c4da87d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1050,12 +1050,12 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg, } static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, - struct scatterlist *sgs[], - unsigned int total_sg, - unsigned int out_sgs, - unsigned int in_sgs, - void *data, - gfp_t gfp) +struct scatterlist *sgs[], +unsigned int total_sg, +unsigned int out_sgs, +unsigned int in_sgs, +void *data, +gfp_t gfp) { struct vring_packed_desc *desc; struct scatterlist *sg; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v6 0/2] virtio_ring: check desc == NULL when using indirect with packed
Fix theoretical issues in virtio_ring. v6: -EAGAIN => -ENOMEM v5: Update commit message. v4: Inside the #2 patch, virtqueue_add_indirect_packed() return -EAGAIN when desc == NULL. v3: Update commit message of the #1 patch. v2: Separate the style fix into a single patch. Xuan Zhuo (2): virtio_ring: make virtqueue_add_indirect_packed prettier virtio_ring: check desc == NULL when using indirect with packed drivers/virtio/virtio_ring.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed
On Wed, Oct 20, 2021 at 07:07:43PM +0800, Xuan Zhuo wrote: > On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin > wrote: > > On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote: > > > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin > > > wrote: > > > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote: > > > > > When using indirect with packed, we don't check for allocation > > > > > failures. > > > > > This patch checks that and fall back on direct. > > > > > > > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") > > > > > Signed-off-by: Xuan Zhuo > > > > > --- > > > > > drivers/virtio/virtio_ring.c | 13 ++--- > > > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > b/drivers/virtio/virtio_ring.c > > > > > index 91a46c4da87d..44a03b6e4dc4 100644 > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct > > > > > vring_virtqueue *vq, > > > > > > > > > > head = vq->packed.next_avail_idx; > > > > > desc = alloc_indirect_packed(total_sg, gfp); > > > > > + if (!desc) > > > > > + /* fall back on direct */ > > > > > > > > this comment belongs in virtqueue_add_packed right after > > > > return. > > > > > > > > > + return -EAGAIN; > > > > > > > > > > > > > -ENOMEM surely? > > > > > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, > > > so we > > > have to choose a different error code. > > > > > > Thanks. > > > > That's exactly the point. Why would we want to recover from allocation > > failure but not DMA map failure? > > >From indirect to direct mode, there is no need to allocate memory, so if we > encounter memory allocation failure, we can fall back from indirect to direct > mode. Memory allocation failure is a temporary problem. > > And if you encounter a dma mmap error, this situation should be notified to > the > upper layer. > > Thanks. Why did dma map fail? A common reason is precisely that we are running out of buffer space. Using direct which does not need extra buffer space thus has a chance to work. > > > > > > > > > > > if (unlikely(vq->vq.num_free < 1)) { > > > > > pr_debug("Can't add buf len 1 - avail = 0\n"); > > > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct > > > > > virtqueue *_vq, > > > > > unsigned int i, n, c, descs_used, err_idx; > > > > > __le16 head_flags, flags; > > > > > u16 head, id, prev, curr, avail_used_flags; > > > > > + int err; > > > > > > > > > > START_USE(vq); > > > > > > > > > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct > > > > > virtqueue *_vq, > > > > > > > > > > BUG_ON(total_sg == 0); > > > > > > > > > > - if (virtqueue_use_indirect(_vq, total_sg)) > > > > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > > - out_sgs, in_sgs, data, gfp); > > > > > + if (virtqueue_use_indirect(_vq, total_sg)) { > > > > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > > out_sgs, > > > > > + in_sgs, data, gfp); > > > > > + if (err != -EAGAIN) > > > > > + return err; > > > > > + } > > > > > > > > > > head = vq->packed.next_avail_idx; > > > > > avail_used_flags = vq->packed.avail_used_flags; > > > > > -- > > > > > 2.31.0 > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6] gpio: virtio: Add IRQ support
On Wednesday, October 20, 2021, Viresh Kumar wrote: > This patch adds IRQ support for the virtio GPIO driver. Note that this > uses the irq_bus_lock/unlock() callbacks, since those operations over > virtio may sleep. > > Reviewed-by: Linus Walleij > Signed-off-by: Viresh Kumar > --- > Bartosz, > > The spec changes are close to merging now, I will let you know once the > ballot > is closed and the spec changes are merged. You can then pick this patch for > 5.16. > > V5->V6: > - Sent it separately as the other patches are already merged. > - Queue the buffers only after enabling the irq (as per latest spec). > - Migrate to raw_spinlock_t. > > drivers/gpio/Kconfig | 1 + > drivers/gpio/gpio-virtio.c | 310 ++- > include/uapi/linux/virtio_gpio.h | 25 +++ > 3 files changed, 332 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig > index fae5141251e5..bfa723ff8e7c 100644 > --- a/drivers/gpio/Kconfig > +++ b/drivers/gpio/Kconfig > @@ -1674,6 +1674,7 @@ config GPIO_MOCKUP > config GPIO_VIRTIO > tristate "VirtIO GPIO support" > depends on VIRTIO > + select GPIOLIB_IRQCHIP > help > Say Y here to enable guest support for virtio-based GPIO > controllers. > > diff --git a/drivers/gpio/gpio-virtio.c b/drivers/gpio/gpio-virtio.c > index d24f1c9264bc..73fbe2eda4b9 100644 > --- a/drivers/gpio/gpio-virtio.c > +++ b/drivers/gpio/gpio-virtio.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -28,12 +29,30 @@ struct virtio_gpio_line { > unsigned int rxlen; > }; > > +struct vgpio_irq_line { > + u8 type; > + bool disabled; > + bool masked; > + bool queued; > + bool update_pending; > + bool queue_pending; > + > + struct virtio_gpio_irq_request ireq cacheline_aligned; > + struct virtio_gpio_irq_response ires cacheline_aligned; > +}; > + > struct virtio_gpio { > struct virtio_device *vdev; > struct mutex lock; /* Protects virtqueue operation */ > struct gpio_chip gc; > struct virtio_gpio_line *lines; > struct virtqueue *request_vq; > + > + /* irq support */ > + struct virtqueue *event_vq; > + struct mutex irq_lock; /* Protects irq operation */ > + raw_spinlock_t eventq_lock; /* Protects queuing of the buffer */ > + struct vgpio_irq_line *irq_lines; > }; > > static int _virtio_gpio_req(struct virtio_gpio *vgpio, u16 type, u16 gpio, > @@ -186,6 +205,244 @@ static void virtio_gpio_set(struct gpio_chip *gc, > unsigned int gpio, int value) > virtio_gpio_req(vgpio, VIRTIO_GPIO_MSG_SET_VALUE, gpio, value, > NULL); > } > > +/* Interrupt handling */ > +static void virtio_gpio_irq_prepare(struct virtio_gpio *vgpio, u16 gpio) > +{ > + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[gpio]; > + struct virtio_gpio_irq_request *ireq = &irq_line->ireq; > + struct virtio_gpio_irq_response *ires = &irq_line->ires; > + struct scatterlist *sgs[2], req_sg, res_sg; > + int ret; > + > + if (WARN_ON(irq_line->queued || irq_line->masked || > irq_line->disabled)) > + return; > + > + ireq->gpio = cpu_to_le16(gpio); > + sg_init_one(&req_sg, ireq, sizeof(*ireq)); > + sg_init_one(&res_sg, ires, sizeof(*ires)); > + sgs[0] = &req_sg; > + sgs[1] = &res_sg; > + > + ret = virtqueue_add_sgs(vgpio->event_vq, sgs, 1, 1, irq_line, > GFP_ATOMIC); > + if (ret) { > + dev_err(&vgpio->vdev->dev, "failed to add request to > eventq\n"); > + return; > + } > + > + irq_line->queued = true; > + virtqueue_kick(vgpio->event_vq); > +} > + > +static void virtio_gpio_irq_enable(struct irq_data *d) > +{ > + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); > + struct virtio_gpio *vgpio = gpiochip_get_data(gc); > + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; > + > + raw_spin_lock(&vgpio->eventq_lock); > + irq_line->disabled = false; > + irq_line->masked = false; > + irq_line->queue_pending = true; > + raw_spin_unlock(&vgpio->eventq_lock); > + > + irq_line->update_pending = true; > +} > + > +static void virtio_gpio_irq_disable(struct irq_data *d) > +{ > + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); > + struct virtio_gpio *vgpio = gpiochip_get_data(gc); > + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; > + > + raw_spin_lock(&vgpio->eventq_lock); > + irq_line->disabled = true; > + irq_line->masked = true; > + irq_line->queue_pending = false; > + raw_spin_unlock(&vgpio->eventq_lock); > + > + irq_line->update_pending = true; > +} > + > +static void virtio_gpio_irq_mask(struct irq_data *d) > +{ > + struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
Re: [PATCH v5 04/16] x86/tdx: Make pages shared in ioremap()
On 10/8/21 7:36 PM, Kuppuswamy Sathyanarayanan wrote: From: "Kirill A. Shutemov" All ioremap()ed pages that are not backed by normal memory (NONE or RESERVED) have to be mapped as shared. Reuse the infrastructure from AMD SEV code. Note that DMA code doesn't use ioremap() to convert memory to shared as DMA buffers backed by normal memory. DMA code make buffer shared with set_memory_decrypted(). Signed-off-by: Kirill A. Shutemov Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v4: * Renamed "protected_guest" to "cc_guest". * Replaced use of prot_guest_has() with cc_guest_has() * Modified the patch to adapt to latest version cc_guest_has() changes. Changes since v3: * Rebased on top of Tom Lendacky's protected guest changes (https://lore.kernel.org/patchwork/cover/1468760/) Changes since v1: * Fixed format issues in commit log. arch/x86/include/asm/pgtable.h | 4 arch/x86/mm/ioremap.c | 8 ++-- include/linux/cc_platform.h| 13 + 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 448cd01eb3ec..ecefccbdf2e3 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -21,6 +21,10 @@ #define pgprot_encrypted(prot)__pgprot(__sme_set(pgprot_val(prot))) #define pgprot_decrypted(prot)__pgprot(__sme_clr(pgprot_val(prot))) +/* Make the page accesable by VMM for confidential guests */ +#define pgprot_cc_guest(prot) __pgprot(pgprot_val(prot) | \ + tdx_shared_mask()) So this is only for TDX guests, so maybe a name that is less generic. Alternatively, you could create pgprot_private()/pgprot_shared() functions that check for SME/SEV or TDX and do the proper thing. Then you can redefine pgprot_encrypted()/pgprot_decrypted() to the new functions? + #ifndef __ASSEMBLY__ #include #include diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 026031b3b782..83daa3f8f39c 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -26,6 +27,7 @@ #include #include #include +#include #include "physaddr.h" @@ -87,8 +89,8 @@ static unsigned int __ioremap_check_ram(struct resource *res) } /* - * In a SEV guest, NONE and RESERVED should not be mapped encrypted because - * there the whole memory is already encrypted. + * In a SEV or TDX guest, NONE and RESERVED should not be mapped encrypted (or + * private in TDX case) because there the whole memory is already encrypted. */ static unsigned int __ioremap_check_encrypted(struct resource *res) { @@ -246,6 +248,8 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size, prot = PAGE_KERNEL_IO; if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted) prot = pgprot_encrypted(prot); + else if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) + prot = pgprot_cc_guest(prot); And if you do the new functions this could be: if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted) prot = pgprot_private(prot); else prot = pgprot_shared(prot); Thanks, Tom switch (pcm) { case _PAGE_CACHE_MODE_UC: diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 7728574d7783..edb1d7a2f6af 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -81,6 +81,19 @@ enum cc_attr { * Examples include TDX Guest. */ CC_ATTR_GUEST_UNROLL_STRING_IO, + + /** +* @CC_ATTR_GUEST_SHARED_MAPPING_INIT: IO Remapped memory is marked +* as shared. +* +* The platform/OS is running as a guest/virtual machine and +* initializes all IO remapped memory as shared. +* +* Examples include TDX Guest (SEV marks all pages as shared by default +* so this feature cannot be enabled for it). +*/ + CC_ATTR_GUEST_SHARED_MAPPING_INIT, + }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 01/16] x86/mm: Move force_dma_unencrypted() to common code
On 10/8/21 7:36 PM, Kuppuswamy Sathyanarayanan wrote: From: "Kirill A. Shutemov" Intel TDX doesn't allow VMM to access guest private memory. Any memory that is required for communication with VMM must be shared explicitly by setting the bit in page table entry. After setting the shared bit, the conversion must be completed with MapGPA hypercall. Details about MapGPA hypercall can be found in [1], sec 3.2. The call informs VMM about the conversion between private/shared mappings. The shared memory is similar to unencrypted memory in AMD SME/SEV terminology but the underlying process of sharing/un-sharing the memory is different for Intel TDX guest platform. SEV assumes that I/O devices can only do DMA to "decrypted" physical addresses without the C-bit set. In order for the CPU to interact with this memory, the CPU needs a decrypted mapping. To add this support, AMD SME code forces force_dma_unencrypted() to return true for platforms that support AMD SEV feature. It will be used for DMA memory allocation API to trigger set_memory_decrypted() for platforms that support AMD SEV feature. TDX is similar. So, to communicate with I/O devices, related pages need to be marked as shared. As mentioned above, shared memory in TDX architecture is similar to decrypted memory in AMD SME/SEV. So similar to AMD SEV, force_dma_unencrypted() has to forced to return true. This support is added in other patches in this series. So move force_dma_unencrypted() out of AMD specific code and call AMD specific (amd_force_dma_unencrypted()) initialization function from it. force_dma_unencrypted() will be modified by later patches to include Intel TDX guest platform specific initialization. Also, introduce new config option X86_MEM_ENCRYPT_COMMON that has to be selected by all x86 memory encryption features. This will be selected by both AMD SEV and Intel TDX guest config options. This is preparation for TDX changes in DMA code and it has no functional change. Can force_dma_unencrypted() be moved to arch/x86/kernel/cc_platform.c, instead of creating a new file? It might fit better with patch #6. Thanks, Tom [1] - https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf Signed-off-by: Kirill A. Shutemov Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v4: * Removed used we/you from commit log. Change since v3: * None Changes since v1: * Removed sev_active(), sme_active() checks in force_dma_unencrypted(). arch/x86/Kconfig | 8 ++-- arch/x86/include/asm/mem_encrypt_common.h | 18 ++ arch/x86/mm/Makefile | 2 ++ arch/x86/mm/mem_encrypt.c | 3 ++- arch/x86/mm/mem_encrypt_common.c | 17 + 5 files changed, 45 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/mem_encrypt_common.h create mode 100644 arch/x86/mm/mem_encrypt_common.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index af49ad084919..37b27412f52e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1519,16 +1519,20 @@ config X86_CPA_STATISTICS helps to determine the effectiveness of preserving large and huge page mappings when mapping protections are changed. +config X86_MEM_ENCRYPT_COMMON + select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select DYNAMIC_PHYSICAL_MASK + def_bool n + config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD select DMA_COHERENT_POOL - select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT - select ARCH_HAS_FORCE_DMA_UNENCRYPTED select INSTRUCTION_DECODER select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS select ARCH_HAS_CC_PLATFORM + select X86_MEM_ENCRYPT_COMMON help Say yes to enable support for the encryption of system memory. This requires an AMD processor that supports Secure Memory diff --git a/arch/x86/include/asm/mem_encrypt_common.h b/arch/x86/include/asm/mem_encrypt_common.h new file mode 100644 index ..697bc40a4e3d --- /dev/null +++ b/arch/x86/include/asm/mem_encrypt_common.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2020 Intel Corporation */ +#ifndef _ASM_X86_MEM_ENCRYPT_COMMON_H +#define _ASM_X86_MEM_ENCRYPT_COMMON_H + +#include +#include + +#ifdef CONFIG_AMD_MEM_ENCRYPT +bool amd_force_dma_unencrypted(struct device *dev); +#else /* CONFIG_AMD_MEM_ENCRYPT */ +static inline bool amd_force_dma_unencrypted(struct device *dev) +{ + return false; +} +#endif /* CONFIG_AMD_MEM_ENCRYPT */ + +#endif diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5864219221ca..b31cb52bf1bd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -52,6 +52,8 @@ obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTI
Re: [PATCH v5 06/16] x86/tdx: Make DMA pages shared
On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote: From: "Kirill A. Shutemov" Just like MKTME, TDX reassigns bits of the physical address for metadata. MKTME used several bits for an encryption KeyID. TDX uses a single bit in guests to communicate whether a physical page should be protected by TDX as private memory (bit set to 0) or unprotected and shared with the VMM (bit set to 1). __set_memory_enc_dec() is now aware about TDX and sets Shared bit accordingly following with relevant TDX hypercall. Also, Do TDX_ACCEPT_PAGE on every 4k page after mapping the GPA range when converting memory to private. Using 4k page size limit is due to current TDX spec restriction. Also, If the GPA (range) was already mapped as an active, private page, the host VMM may remove the private page from the TD by following the “Removing TD Private Pages” sequence in the Intel TDX-module specification [1] to safely block the mapping(s), flush the TLB and cache, and remove the mapping(s). BUG() if TDX_ACCEPT_PAGE fails (except "previously accepted page" case) , as the guest is completely hosed if it can't access memory. [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Ftdx-module-1eas-v0.85.039.pdf&data=04%7C01%7Cthomas.lendacky%40amd.com%7C0e667adf5a4042abce3908d98abd07a8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637693367201703893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UGxQ9xBjWsmev7PetX%2BuS0RChkAXyaH7q6JHO9ZiUtY%3D&reserved=0 Tested-by: Kai Huang Signed-off-by: Kirill A. Shutemov Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan ... diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c index f063c885b0a5..119a9056efbb 100644 --- a/arch/x86/mm/mem_encrypt_common.c +++ b/arch/x86/mm/mem_encrypt_common.c @@ -9,9 +9,18 @@ #include #include +#include /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) { - return amd_force_dma_unencrypted(dev); + if (cc_platform_has(CC_ATTR_GUEST_TDX) && + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return true; + + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) || + cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) + return amd_force_dma_unencrypted(dev); + + return false; Assuming the original force_dma_unencrypted() function was moved here or cc_platform.c, then you shouldn't need any changes. Both SEV and TDX require true be returned if CC_ATTR_GUEST_MEM_ENCRYPT returns true. And then TDX should never return true for CC_ATTR_HOST_MEM_ENCRYPT. } diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 527957586f3c..6c531d5cb5fd 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "../mm_internal.h" @@ -1981,8 +1982,10 @@ int set_memory_global(unsigned long addr, int numpages) __pgprot(_PAGE_GLOBAL), 0); } -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) +static int __set_memory_protect(unsigned long addr, int numpages, bool protect) { + pgprot_t mem_protected_bits, mem_plain_bits; + enum tdx_map_type map_type; struct cpa_data cpa; int ret; @@ -1997,8 +2000,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) memset(&cpa, 0, sizeof(cpa)); cpa.vaddr = &addr; cpa.numpages = numpages; - cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0); - cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC); + + if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) { + mem_protected_bits = __pgprot(0); + mem_plain_bits = pgprot_cc_shared_mask(); How about having generic versions for both shared and private that return the proper value for SEV or TDX. Then this remains looking similar to as it does now, just replacing the __pgprot() calls with the appropriate pgprot_cc_{shared,private}_mask(). Thanks, Tom + } else { + mem_protected_bits = __pgprot(_PAGE_ENC); + mem_plain_bits = __pgprot(0); + } + + if (protect) { + cpa.mask_set = mem_protected_bits; + cpa.mask_clr = mem_plain_bits; + map_type = TDX_MAP_PRIVATE; + } else { + cpa.mask_set = mem_plain_bits; + cpa.mask_clr = mem_protected_bits; + map_type = TDX_MAP_SHARED; + } + cpa.pgd = init_mm.pgd; /* Must avoid aliasing mappings in the highmem code */ @@ -2006,9 +2026,17 @@ static int __set_memory_enc_dec(unsigne
Re: [PATCH v5 07/16] x86/kvm: Use bounce buffers for TD guest
On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote: From: "Kirill A. Shutemov" Intel TDX doesn't allow VMM to directly access guest private memory. Any memory that is required for communication with VMM must be shared explicitly. The same rule applies for any DMA to and from TDX guest. All DMA pages had to marked as shared pages. A generic way to achieve this without any changes to device drivers is to use the SWIOTLB framework. This method of handling is similar to AMD SEV. So extend this support for TDX guest as well. Also since there are some common code between AMD SEV and TDX guest in mem_encrypt_init(), move it to mem_encrypt_common.c and call AMD specific init function from it Signed-off-by: Kirill A. Shutemov Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v4: * Replaced prot_guest_has() with cc_guest_has(). Changes since v3: * Rebased on top of Tom Lendacky's protected guest changes (https://lore.kernel.org/patchwork/cover/1468760/) Changes since v1: * Removed sme_me_mask check for amd_mem_encrypt_init() in mem_encrypt_init(). arch/x86/include/asm/mem_encrypt_common.h | 3 +++ arch/x86/kernel/tdx.c | 2 ++ arch/x86/mm/mem_encrypt.c | 5 + arch/x86/mm/mem_encrypt_common.c | 14 ++ 4 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mem_encrypt_common.h b/arch/x86/include/asm/mem_encrypt_common.h index 697bc40a4e3d..bc90e565bce4 100644 --- a/arch/x86/include/asm/mem_encrypt_common.h +++ b/arch/x86/include/asm/mem_encrypt_common.h @@ -8,11 +8,14 @@ #ifdef CONFIG_AMD_MEM_ENCRYPT bool amd_force_dma_unencrypted(struct device *dev); +void __init amd_mem_encrypt_init(void); #else /* CONFIG_AMD_MEM_ENCRYPT */ static inline bool amd_force_dma_unencrypted(struct device *dev) { return false; } + +static inline void amd_mem_encrypt_init(void) {} #endif /* CONFIG_AMD_MEM_ENCRYPT */ #endif diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 433f366ca25c..ce8e3019b812 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -12,6 +12,7 @@ #include #include #include /* force_sig_fault() */ +#include /* TDX Module call Leaf IDs */ #define TDX_GET_INFO 1 @@ -577,6 +578,7 @@ void __init tdx_early_init(void) pv_ops.irq.halt = tdx_halt; legacy_pic = &null_legacy_pic; + swiotlb_force = SWIOTLB_FORCE; cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "tdx:cpu_hotplug", NULL, tdx_cpu_offline_prepare); diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 5d7fbed73949..8385bc4565e9 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -438,14 +438,11 @@ static void print_mem_encrypt_feature_info(void) } /* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void) +void __init amd_mem_encrypt_init(void) { if (!sme_me_mask) return; - /* Call into SWIOTLB to update the SWIOTLB DMA buffers */ - swiotlb_update_mem_attributes(); - /* * With SEV, we need to unroll the rep string I/O instructions, * but SEV-ES supports them through the #VC handler. diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c index 119a9056efbb..6fe44c6cb753 100644 --- a/arch/x86/mm/mem_encrypt_common.c +++ b/arch/x86/mm/mem_encrypt_common.c @@ -10,6 +10,7 @@ #include #include #include +#include /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) @@ -24,3 +25,16 @@ bool force_dma_unencrypted(struct device *dev) return false; } + +/* Architecture __weak replacement functions */ +void __init mem_encrypt_init(void) +{ + /* +* For TDX guest or SEV/SME, call into SWIOTLB to update +* the SWIOTLB DMA buffers +*/ + if (sme_me_mask || cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) Can't you just make this: if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) SEV will return true if sme_me_mask is not zero and TDX should only return true if it is TDX guest, right? Thanks, Tom + swiotlb_update_mem_attributes(); + + amd_mem_encrypt_init(); +} ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 06/16] x86/tdx: Make DMA pages shared
On 10/20/21 11:45 AM, Sathyanarayanan Kuppuswamy wrote: On 10/20/21 9:33 AM, Tom Lendacky wrote: On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote: ... bool force_dma_unencrypted(struct device *dev) { - return amd_force_dma_unencrypted(dev); + if (cc_platform_has(CC_ATTR_GUEST_TDX) && + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return true; + + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) || + cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) + return amd_force_dma_unencrypted(dev); + + return false; Assuming the original force_dma_unencrypted() function was moved here or cc_platform.c, then you shouldn't need any changes. Both SEV and TDX require true be returned if CC_ATTR_GUEST_MEM_ENCRYPT returns true. And then TDX should never return true for CC_ATTR_HOST_MEM_ENCRYPT. For non TDX case, in CC_ATTR_HOST_MEM_ENCRYPT, we should still call amd_force_dma_unencrypted() right? What I'm saying is that you wouldn't have amd_force_dma_unencrypted(). I think the whole force_dma_unencrypted() can exist as-is in a different file, whether that's cc_platform.c or mem_encrypt_common.c. It will return true for an SEV or TDX guest, true for an SME host based on the DMA mask or else false. That should work just fine for TDX. Thanks, Tom } diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 527957586f3c..6c531d5cb5fd 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "../mm_internal.h" @@ -1981,8 +1982,10 @@ int set_memory_global(unsigned long addr, int numpages) __pgprot(_PAGE_GLOBAL), 0); } -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) +static int __set_memory_protect(unsigned long addr, int numpages, bool protect) { + pgprot_t mem_protected_bits, mem_plain_bits; + enum tdx_map_type map_type; struct cpa_data cpa; int ret; @@ -1997,8 +2000,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) memset(&cpa, 0, sizeof(cpa)); cpa.vaddr = &addr; cpa.numpages = numpages; - cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0); - cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC); + + if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) { + mem_protected_bits = __pgprot(0); + mem_plain_bits = pgprot_cc_shared_mask(); How about having generic versions for both shared and private that return the proper value for SEV or TDX. Then this remains looking similar to as it does now, just replacing the __pgprot() calls with the appropriate pgprot_cc_{shared,private}_mask(). Makes sense. Thanks, Tom + } else { + mem_protected_bits = __pgprot(_PAGE_ENC); + mem_plain_bits = __pgprot(0); + } + + if (protect) { + cpa.mask_set = mem_protected_bits; + cpa.mask_clr = mem_plain_bits; + map_type = TDX_MAP_PRIVATE; + } else { + cpa.mask_set = mem_plain_bits; + cpa.mask_clr = mem_protected_bits; + map_type = TDX_MAP_SHARED; + } + cpa.pgd = init_mm.pgd; /* Must avoid aliasing mappings in the highmem code */ @@ -2006,9 +2026,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) vm_unmap_aliases(); /* - * Before changing the encryption attribute, we need to flush caches. + * Before changing the encryption attribute, flush caches. + * + * For TDX, guest is responsible for flushing caches on private->shared + * transition. VMM is responsible for flushing on shared->private. */ - cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT)); + if (cc_platform_has(CC_ATTR_GUEST_TDX)) { + if (map_type == TDX_MAP_SHARED) + cpa_flush(&cpa, 1); + } else { + cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT)); + } ret = __change_page_attr_set_clr(&cpa, 1); @@ -2021,18 +2049,21 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) */ cpa_flush(&cpa, 0); + if (!ret && cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) + ret = tdx_hcall_gpa_intent(__pa(addr), numpages, map_type); + return ret; } int set_memory_encrypted(unsigned long addr, int numpages) { - return __set_memory_enc_dec(addr, numpages, true); + return __set_memory_protect(addr, numpages, true); } EXPORT_SYMBOL_GPL(set_memory_encrypted); int set_memory_decrypted(unsigned long addr, int numpages) { - return __set_memory_enc_dec(addr, numpages, false); + return __set_memory_protect(addr, numpages, false); } EXPORT_SYMBOL_GPL(set_memory_decrypted); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 07/16] x86/kvm: Use bounce buffers for TD guest
On 10/20/21 11:50 AM, Sathyanarayanan Kuppuswamy wrote: On 10/20/21 9:39 AM, Tom Lendacky wrote: On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote: From: "Kirill A. Shutemov" Intel TDX doesn't allow VMM to directly access guest private memory. Any memory that is required for communication with VMM must be shared explicitly. The same rule applies for any DMA to and from TDX guest. All DMA pages had to marked as shared pages. A generic way to achieve this without any changes to device drivers is to use the SWIOTLB framework. This method of handling is similar to AMD SEV. So extend this support for TDX guest as well. Also since there are some common code between AMD SEV and TDX guest in mem_encrypt_init(), move it to mem_encrypt_common.c and call AMD specific init function from it Signed-off-by: Kirill A. Shutemov Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v4: * Replaced prot_guest_has() with cc_guest_has(). Changes since v3: * Rebased on top of Tom Lendacky's protected guest changes (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fpatchwork%2Fcover%2F1468760%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cad852703670a44b1e29108d993e9dcc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703454904800065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lXwd5%2Fhnmd5QYaIElQ%2BtVT%2B62JHq%2Bimno5VIjTILaig%3D&reserved=0) Changes since v1: * Removed sme_me_mask check for amd_mem_encrypt_init() in mem_encrypt_init(). arch/x86/include/asm/mem_encrypt_common.h | 3 +++ arch/x86/kernel/tdx.c | 2 ++ arch/x86/mm/mem_encrypt.c | 5 + arch/x86/mm/mem_encrypt_common.c | 14 ++ 4 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mem_encrypt_common.h b/arch/x86/include/asm/mem_encrypt_common.h index 697bc40a4e3d..bc90e565bce4 100644 --- a/arch/x86/include/asm/mem_encrypt_common.h +++ b/arch/x86/include/asm/mem_encrypt_common.h @@ -8,11 +8,14 @@ #ifdef CONFIG_AMD_MEM_ENCRYPT bool amd_force_dma_unencrypted(struct device *dev); +void __init amd_mem_encrypt_init(void); #else /* CONFIG_AMD_MEM_ENCRYPT */ static inline bool amd_force_dma_unencrypted(struct device *dev) { return false; } + +static inline void amd_mem_encrypt_init(void) {} #endif /* CONFIG_AMD_MEM_ENCRYPT */ #endif diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 433f366ca25c..ce8e3019b812 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -12,6 +12,7 @@ #include #include #include /* force_sig_fault() */ +#include /* TDX Module call Leaf IDs */ #define TDX_GET_INFO 1 @@ -577,6 +578,7 @@ void __init tdx_early_init(void) pv_ops.irq.halt = tdx_halt; legacy_pic = &null_legacy_pic; + swiotlb_force = SWIOTLB_FORCE; cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "tdx:cpu_hotplug", NULL, tdx_cpu_offline_prepare); diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 5d7fbed73949..8385bc4565e9 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -438,14 +438,11 @@ static void print_mem_encrypt_feature_info(void) } /* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void) +void __init amd_mem_encrypt_init(void) { if (!sme_me_mask) return; - /* Call into SWIOTLB to update the SWIOTLB DMA buffers */ - swiotlb_update_mem_attributes(); - /* * With SEV, we need to unroll the rep string I/O instructions, * but SEV-ES supports them through the #VC handler. diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c index 119a9056efbb..6fe44c6cb753 100644 --- a/arch/x86/mm/mem_encrypt_common.c +++ b/arch/x86/mm/mem_encrypt_common.c @@ -10,6 +10,7 @@ #include #include #include +#include /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) @@ -24,3 +25,16 @@ bool force_dma_unencrypted(struct device *dev) return false; } + +/* Architecture __weak replacement functions */ +void __init mem_encrypt_init(void) +{ + /* + * For TDX guest or SEV/SME, call into SWIOTLB to update + * the SWIOTLB DMA buffers + */ + if (sme_me_mask || cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) Can't you just make this: if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) SEV will return true if sme_me_mask is not zero and TDX should only return true if it is TDX guest, right? Yes. It can be simplified. But where shall we leave this function cc_platform.c or here? Either one works... all depends on how the maintainers feel about creating/using mem_encrypt_common.c or using cc_platform.c. Thanks, Tom Thanks, Tom + swiotlb_update_mem_attributes(); + +
Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed
On Wed, 20 Oct 2021 10:52:24 -0400, Michael S. Tsirkin wrote: > On Wed, Oct 20, 2021 at 07:07:43PM +0800, Xuan Zhuo wrote: > > On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin > > wrote: > > > On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote: > > > > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin > > > > wrote: > > > > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote: > > > > > > When using indirect with packed, we don't check for allocation > > > > > > failures. > > > > > > This patch checks that and fall back on direct. > > > > > > > > > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support") > > > > > > Signed-off-by: Xuan Zhuo > > > > > > --- > > > > > > drivers/virtio/virtio_ring.c | 13 ++--- > > > > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > > b/drivers/virtio/virtio_ring.c > > > > > > index 91a46c4da87d..44a03b6e4dc4 100644 > > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > > @@ -1065,6 +1065,9 @@ static int > > > > > > virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > > > > > > > > > > > > head = vq->packed.next_avail_idx; > > > > > > desc = alloc_indirect_packed(total_sg, gfp); > > > > > > + if (!desc) > > > > > > + /* fall back on direct */ > > > > > > > > > > this comment belongs in virtqueue_add_packed right after > > > > > return. > > > > > > > > > > > + return -EAGAIN; > > > > > > > > > > > > > > > > -ENOMEM surely? > > > > > > > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap > > > > fails, so we > > > > have to choose a different error code. > > > > > > > > Thanks. > > > > > > That's exactly the point. Why would we want to recover from allocation > > > failure but not DMA map failure? > > > > >From indirect to direct mode, there is no need to allocate memory, so if we > > encounter memory allocation failure, we can fall back from indirect to > > direct > > mode. Memory allocation failure is a temporary problem. > > > > And if you encounter a dma mmap error, this situation should be notified to > > the > > upper layer. > > > > Thanks. > > Why did dma map fail? > A common reason is precisely that we are running > out of buffer space. Using direct which does not > need extra buffer space thus has a chance to work. It turns out that this is the case, then I have no problem. Thanks. > > > > > > > > > > > > > > > > if (unlikely(vq->vq.num_free < 1)) { > > > > > > pr_debug("Can't add buf len 1 - avail = 0\n"); > > > > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct > > > > > > virtqueue *_vq, > > > > > > unsigned int i, n, c, descs_used, err_idx; > > > > > > __le16 head_flags, flags; > > > > > > u16 head, id, prev, curr, avail_used_flags; > > > > > > + int err; > > > > > > > > > > > > START_USE(vq); > > > > > > > > > > > > @@ -1191,9 +1195,12 @@ static inline int > > > > > > virtqueue_add_packed(struct virtqueue *_vq, > > > > > > > > > > > > BUG_ON(total_sg == 0); > > > > > > > > > > > > - if (virtqueue_use_indirect(_vq, total_sg)) > > > > > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > > > - out_sgs, in_sgs, data, gfp); > > > > > > + if (virtqueue_use_indirect(_vq, total_sg)) { > > > > > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, > > > > > > out_sgs, > > > > > > + in_sgs, data, gfp); > > > > > > + if (err != -EAGAIN) > > > > > > + return err; > > > > > > + } > > > > > > > > > > > > head = vq->packed.next_avail_idx; > > > > > > avail_used_flags = vq->packed.avail_used_flags; > > > > > > -- > > > > > > 2.31.0 > > > > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
On Wed, Oct 20, 2021 at 7:57 PM Eugenio Perez Martin wrote: > > On Wed, Oct 20, 2021 at 11:03 AM Jason Wang wrote: > > > > On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin > > wrote: > > > > > > On Wed, Oct 20, 2021 at 4:07 AM Jason Wang wrote: > > > > > > > > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang wrote: > > > > > > > > > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin > > > > > wrote: > > > > > > > > > > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道: > > > > > > > > Use translations added in VhostIOVATree in SVQ. > > > > > > > > > > > > > > > > Now every element needs to store the previous address also, so > > > > > > > > VirtQueue > > > > > > > > can consume the elements properly. This adds a little overhead > > > > > > > > per VQ > > > > > > > > element, having to allocate more memory to stash them. As a > > > > > > > > possible > > > > > > > > optimization, this allocation could be avoided if the > > > > > > > > descriptor is not > > > > > > > > a chain but a single one, but this is left undone. > > > > > > > > > > > > > > > > TODO: iova range should be queried before, and add logic to > > > > > > > > fail when > > > > > > > > GPA is outside of its range and memory listener or svq add it. > > > > > > > > > > > > > > > > Signed-off-by: Eugenio Pérez > > > > > > > > --- > > > > > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +- > > > > > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 > > > > > > > > - > > > > > > > > hw/virtio/vhost-vdpa.c | 40 - > > > > > > > > hw/virtio/trace-events | 1 + > > > > > > > > 4 files changed, 152 insertions(+), 23 deletions(-) > > > > > > > > > > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h > > > > > > > > b/hw/virtio/vhost-shadow-virtqueue.h > > > > > > > > index b7baa424a7..a0e6b5267a 100644 > > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h > > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h > > > > > > > > @@ -11,6 +11,7 @@ > > > > > > > > #define VHOST_SHADOW_VIRTQUEUE_H > > > > > > > > > > > > > > > > #include "hw/virtio/vhost.h" > > > > > > > > +#include "hw/virtio/vhost-iova-tree.h" > > > > > > > > > > > > > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; > > > > > > > > > > > > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, > > > > > > > > unsigned idx, > > > > > > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx, > > > > > > > > VhostShadowVirtqueue *svq); > > > > > > > > > > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int > > > > > > > > idx); > > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int > > > > > > > > idx, > > > > > > > > +VhostIOVATree *iova_map); > > > > > > > > > > > > > > > > void vhost_svq_free(VhostShadowVirtqueue *vq); > > > > > > > > > > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c > > > > > > > > b/hw/virtio/vhost-shadow-virtqueue.c > > > > > > > > index 2fd0bab75d..9db538547e 100644 > > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c > > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c > > > > > > > > @@ -11,12 +11,19 @@ > > > > > > > > #include "hw/virtio/vhost-shadow-virtqueue.h" > > > > > > > > #include "hw/virtio/vhost.h" > > > > > > > > #include "hw/virtio/virtio-access.h" > > > > > > > > +#include "hw/virtio/vhost-iova-tree.h" > > > > > > > > > > > > > > > > #include "standard-headers/linux/vhost_types.h" > > > > > > > > > > > > > > > > #include "qemu/error-report.h" > > > > > > > > #include "qemu/main-loop.h" > > > > > > > > > > > > > > > > +typedef struct SVQElement { > > > > > > > > +VirtQueueElement elem; > > > > > > > > +void **in_sg_stash; > > > > > > > > +void **out_sg_stash; > > > > > > > > +} SVQElement; > > > > > > > > + > > > > > > > > /* Shadow virtqueue to relay notifications */ > > > > > > > > typedef struct VhostShadowVirtqueue { > > > > > > > > /* Shadow vring */ > > > > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue { > > > > > > > > /* Virtio device */ > > > > > > > > VirtIODevice *vdev; > > > > > > > > > > > > > > > > +/* IOVA mapping if used */ > > > > > > > > +VhostIOVATree *iova_map; > > > > > > > > + > > > > > > > > /* Map for returning guest's descriptors */ > > > > > > > > -VirtQueueElement **ring_id_maps; > > > > > > > > +SVQElement **ring_id_maps; > > > > > > > > > > > > > > > > /* Next head to expose to device */ > > > > > > > > uint16_t avail_idx_shadow; > > > > > > > > @@ -79,13 +89,6 @@ bool > > > > > > > > vhost_svq_valid_device_features(uint64_t *dev_features) > > > > > > > > continue; > > > > > > > > > > > > > > > > case VIRTIO_F_ACCESS_PLATFORM: > > > > > > >
Re: [PATCH 1/2] i2c: virtio: disable timeout handling
On 2021/10/20 19:03, Viresh Kumar wrote: On 20-10-21, 12:55, Vincent Whitchurch wrote: If the timeout cannot be disabled, then the driver should be fixed to always copy buffers and hold on to them to avoid memory corruption in the case of timeout, as I mentioned in my commit message. That would be quite a substantial change to the driver so it's not something I'm personally comfortable with doing, especially not this late in the -rc cycle, so I'd leave that to others. Or we can avoid clearing up and freeing the buffers here until the point where the buffers are returned by the host. Until that happens, we can avoid taking new requests but return to the earlier caller with timeout failure. That would avoid corruption, by freeing buffers sooner, and not hanging of the kernel. It seems similar to use "wait_for_completion". If the other side is hacked, the guest may never get the buffers returned by the host, right ? For this moment, we can solve the problem by using a hardcoded big value or disabling the timeout. Over the long term, I think the backend should provide that timeout value and guarantee that its processing time should not exceed that value. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6] gpio: virtio: Add IRQ support
On 20-10-21, 18:10, Andy Shevchenko wrote: > On Wednesday, October 20, 2021, Viresh Kumar > wrote: > > +static int virtio_gpio_irq_set_type(struct irq_data *d, unsigned int > > type) > > +{ > > + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); > > + struct virtio_gpio *vgpio = gpiochip_get_data(gc); > > + struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq]; > > + > > + switch (type) { > > + case IRQ_TYPE_NONE: > > + type = VIRTIO_GPIO_IRQ_TYPE_NONE; > > + break; > > > IIRC you add dead code. IRQ framework never calls this if type is not set. Yes, but it is allowed to call irq_set_irq_type(irq, IRQ_TYPE_NONE); and the irq framework won't disallow it AFAICT. > > +static void virtio_gpio_event_vq(struct virtqueue *vq) > > +{ > > + irq = irq_find_mapping(vgpio->gc.irq.domain, gpio); > > + WARN_ON(!irq); > > + > > + ret = generic_handle_irq(irq); > > > IIRC there is a new API that basically combines the two above. generic_handle_domain_irq(), thanks. > > struct virtio_gpio_config { > > __le16 ngpio; > > __u8 padding[2]; > > @@ -44,4 +56,17 @@ struct virtio_gpio_response_get_names { > > __u8 value[]; > > }; > > > > +/* Virtio GPIO IRQ Request / Response */ > > +struct virtio_gpio_irq_request { > > + __le16 gpio; > > +}; > > + > > +struct virtio_gpio_irq_response { > > + __u8 status; > > +}; > > > > > I’m wondering if those above should be packed. You are talking about the newly added ones or the ones before ? In any case, they are all already packed (i.e. they have explicit padding wherever required) and properly aligned. Compiler won't add any other padding to them. -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On 2021/10/19 15:46, Vincent Whitchurch wrote: The driver currently assumes that the notify callback is only received when the device is done with all the queued buffers. However, this is not true, since the notify callback could be called without any of the queued buffers being completed (for example, with virtio-pci and shared interrupts) or with only some of the buffers being completed (since the driver makes them available to the device in multiple separate virtqueue_add_sgs() calls). Can the backend driver control the time point of interrupt injection ? I can't think of why the backend has to send an early interrupt. This operation should be avoided in the backend driver if possible. However, this change make sense if early interrupt can't be avoid. This can lead to incorrect data on the I2C bus or memory corruption in the guest if the device operates on buffers which are have been freed by the driver. (The WARN_ON in the driver is also triggered.) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] i2c: virtio: fix completion handling
On 21-10-21, 13:55, Jie Deng wrote: > Can the backend driver control the time point of interrupt injection ? I > can't think of > > why the backend has to send an early interrupt. This operation should be > avoided > > in the backend driver if possible. However, this change make sense if early > interrupt > > can't be avoid. The backend driver probably won't send an event, but the notification event, if it comes, shouldn't have side effects like what it currently have (where we finish the ongoing transfer by calling complete()). -- viresh ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 0/5] iommu/virtio: Add identity domains
> From: j...@8bytes.org > Sent: Monday, October 18, 2021 7:38 PM > > On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote: > > I saw a concept of deferred attach in iommu core. See iommu_is_ > > attach_deferred(). Currently this is vendor specific and I haven't > > looked into the exact reason why some vendor sets it now. Just > > be curious whether the same reason might be applied to virtio-iommu. > > The reason for attach_deferred is kdump support, where the IOMMU driver > needs to keep the mappings from the old kernel until the device driver > of the new kernel takes over. > ok. Then there is no problem with the original statement: All endpoints managed by the IOMMU should be attached to a domain, so global bypass isn't in use after endpoints are probed. Thanks Kevin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 0/5] iommu/virtio: Add identity domains
> From: Jean-Philippe Brucker > Sent: Monday, October 18, 2021 11:24 PM > > On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote: > > > From: Jean-Philippe Brucker > > > Sent: Wednesday, October 13, 2021 8:11 PM > > > > > > Support identity domains, allowing to only enable IOMMU protection for > a > > > subset of endpoints (those assigned to userspace, for example). Users > > > may enable identity domains at compile time > > > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time > > > (iommu.passthrough=1) or > > > runtime (/sys/kernel/iommu_groups/*/type = identity). > > > > Do we want to use consistent terms between spec (bypass domain) > > and code (identity domain)? > > I don't think we have to. Linux uses "identity" domains and "passthrough" > IOMMU. The old virtio-iommu feature was "bypass" so we should keep that > for the new one, to be consistent. And then I've used "bypass" for domains > as well, in the spec, because it would look strange to use a different > term for the same concept. I find that it sort of falls into place: Linux' > identity domains can be implemented either with bypass or identity-mapped > virtio-iommu domains. make sense. > > > > > > > Patches 1-2 support identity domains using the optional > > > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in > the > > > spec, see [1] for the latest proposal. > > > > > > Patches 3-5 add a fallback to identity mappings, when the feature is not > > > supported. > > > > > > Note that this series doesn't touch the global bypass bit added by > > > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the > IOMMU > > > should > > > be attached to a domain, so global bypass isn't in use after endpoints > > > > I saw a concept of deferred attach in iommu core. See iommu_is_ > > attach_deferred(). Currently this is vendor specific and I haven't > > looked into the exact reason why some vendor sets it now. Just > > be curious whether the same reason might be applied to virtio-iommu. > > > > > are probed. Before that, the global bypass policy is decided by the > > > hypervisor and firmware. So I don't think Linux needs to touch the > > > > This reminds me one thing. The spec says that the global bypass > > bit is sticky and not affected by reset. > > The spec talks about *device* reset, triggered by software writing 0 to > the status register, but it doesn't mention system reset. Would be good to > clarify that. Something like: > > If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it MAY > initialize the \field{bypass} field to 1. Field \field{bypass} SHOULD > NOT change on device reset, but SHOULD be restored to its initial > value on system reset. looks good to me > > > This implies that in the case > > of rebooting the VM into a different OS, the previous OS actually > > has the right to override this setting for the next OS. Is it a right > > design? Even the firmware itself is unable to identify the original > > setting enforced by the hypervisor after reboot. I feel the hypervisor > > setting should be recovered after reset since it reflects the > > security measure enforced by the virtual platform? > > So I think clarifying system reset should address your questions. > I believe we should leave bypass sticky across device reset, so a FW->OS > transition, where the OS resets the device, does not open a vulnerability > (if bypass was enabled at boot and then left disabled by FW.) > > It's still a good idea for the OS to restore on shutdown the bypass value > it was booted with. So it can kexec into an OS that doesn't support > virtio-iommu, for example. > Thanks Kevin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 0/5] iommu/virtio: Add identity domains
> From: Jean-Philippe Brucker > Sent: Wednesday, October 13, 2021 8:11 PM > > Support identity domains, allowing to only enable IOMMU protection for a > subset of endpoints (those assigned to userspace, for example). Users > may enable identity domains at compile time > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time > (iommu.passthrough=1) or > runtime (/sys/kernel/iommu_groups/*/type = identity). > > Patches 1-2 support identity domains using the optional > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the > spec, see [1] for the latest proposal. > > Patches 3-5 add a fallback to identity mappings, when the feature is not > supported. > > Note that this series doesn't touch the global bypass bit added by > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU > should > be attached to a domain, so global bypass isn't in use after endpoints > are probed. Before that, the global bypass policy is decided by the > hypervisor and firmware. So I don't think Linux needs to touch the > global bypass bit, but there are some patches available on my > virtio-iommu/bypass branch [2] to test it. > > QEMU patches are on my virtio-iommu/bypass branch [3] (and the list) > > [1] https://www.mail-archive.com/virtio-dev@lists.oasis- > open.org/msg07898.html > [2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass > [3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass > > Jean-Philippe Brucker (5): > iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG > iommu/virtio: Support bypass domains > iommu/virtio: Sort reserved regions > iommu/virtio: Pass end address to viommu_add_mapping() > iommu/virtio: Support identity-mapped domains > > include/uapi/linux/virtio_iommu.h | 8 ++- > drivers/iommu/virtio-iommu.c | 113 +- > 2 files changed, 101 insertions(+), 20 deletions(-) > For this series: Reviewed-by: Kevin Tian ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization