Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-20 Thread Jie Deng



On 2021/10/20 14:41, Viresh Kumar wrote:

On 20-10-21, 14:35, Jie Deng wrote:

Yes, but we need to know what's the best value to be configured for a
specific "other side".

I think the "other side" should be more aware of what value is reasonable to
be used.

If we _really_ need that, then it would require an update to the
specification first.

I am not sure if the other side is the only party in play here. It
depends on the entire setup and not just the hardware device.
Specially with virtualisation things become really slow because of
context switches, etc. It may be better for the guest userspace to
decide on the value.

Since this is specially for virtualisation, the kernel may set the
value to few HZ by default, lets say 10 (Yeah its really high) :).



I'm OK with this way for the simplicity.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe

2021-10-20 Thread Stefano Garzarella

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:

If an untrusted device neogitates BLK_F_MQ but advertises a zero


s/neogitates/negotiates


num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini 
Cc: Stefan Hajnoczi 
Cc: Stefano Garzarella 
Signed-off-by: Jason Wang 
---
drivers/block/virtio_blk.c | 4 
1 file changed, 4 insertions(+)


Should we CC stable?

Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe

2021-10-20 Thread Michael S. Tsirkin
On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:
> On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
> 
> s/neogitates/negotiates
> 
> > num_queues, the driver may end up trying to allocating zero size
> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
> > against the NULL. This will lead unexpected results.
> > 
> > Fixing this by failing the probe in this case.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Stefan Hajnoczi 
> > Cc: Stefano Garzarella 
> > Signed-off-by: Jason Wang 
> > ---
> > drivers/block/virtio_blk.c | 4 
> > 1 file changed, 4 insertions(+)
> 
> Should we CC stable?
> 
> Reviewed-by: Stefano Garzarella 

No IMO - I don't think we can reasonably expect stable to become
protected against attacks on encrypted guests. That's
a new feature, not a bugfix.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe

2021-10-20 Thread Stefan Hajnoczi
On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero
> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
> 
> Fixing this by failing the probe in this case.
> 
> Cc: Paolo Bonzini 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Garzarella 
> Signed-off-by: Jason Wang 
> ---
>  drivers/block/virtio_blk.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Michael S. Tsirkin
On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote:
> On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin  
> wrote:
> > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote:
> > > When using indirect with packed, we don't check for allocation failures.
> > > This patch checks that and fall back on direct.
> > >
> > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/virtio/virtio_ring.c | 13 ++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 91a46c4da87d..44a03b6e4dc4 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct 
> > > vring_virtqueue *vq,
> > >
> > >   head = vq->packed.next_avail_idx;
> > >   desc = alloc_indirect_packed(total_sg, gfp);
> > > + if (!desc)
> > > + /* fall back on direct */
> >
> > this comment belongs in virtqueue_add_packed right after
> > return.
> >
> > > + return -EAGAIN;
> > >
> >
> > -ENOMEM surely?
> 
> virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, so we
> have to choose a different error code.
> 
> Thanks.

That's exactly the point.  Why would we want to recover from allocation
failure but not DMA map failure?

> >
> > >   if (unlikely(vq->vq.num_free < 1)) {
> > >   pr_debug("Can't add buf len 1 - avail = 0\n");
> > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct 
> > > virtqueue *_vq,
> > >   unsigned int i, n, c, descs_used, err_idx;
> > >   __le16 head_flags, flags;
> > >   u16 head, id, prev, curr, avail_used_flags;
> > > + int err;
> > >
> > >   START_USE(vq);
> > >
> > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct 
> > > virtqueue *_vq,
> > >
> > >   BUG_ON(total_sg == 0);
> > >
> > > - if (virtqueue_use_indirect(_vq, total_sg))
> > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg,
> > > - out_sgs, in_sgs, data, gfp);
> > > + if (virtqueue_use_indirect(_vq, total_sg)) {
> > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
> > > + in_sgs, data, gfp);
> > > + if (err != -EAGAIN)
> > > + return err;
> > > + }
> > >
> > >   head = vq->packed.next_avail_idx;
> > >   avail_used_flags = vq->packed.avail_used_flags;
> > > --
> > > 2.31.0
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe

2021-10-20 Thread Stefano Garzarella

On Wed, Oct 20, 2021 at 03:37:31AM -0400, Michael S. Tsirkin wrote:

On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero

s/neogitates/negotiates

> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
>
> Fixing this by failing the probe in this case.
>
> Cc: Paolo Bonzini 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Garzarella 
> Signed-off-by: Jason Wang 
> ---
> drivers/block/virtio_blk.c | 4 
> 1 file changed, 4 insertions(+)

Should we CC stable?

Reviewed-by: Stefano Garzarella 


No IMO - I don't think we can reasonably expect stable to become
protected against attacks on encrypted guests. That's
a new feature, not a bugfix.


Yep, make sense.
I had only seen the single patch, not the entire series, and it seemed 
like a fix.


Viewed as a whole, it makes sense to consider it a new feature to 
improve audits in the guest.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Jie Deng



On 2021/10/19 16:22, Viresh Kumar wrote:

On 19-10-21, 09:46, Vincent Whitchurch wrote:

  static void virtio_i2c_msg_done(struct virtqueue *vq)
  {
-   struct virtio_i2c *vi = vq->vdev->priv;
+   struct virtio_i2c_req *req;
+   unsigned int len;
  
-	complete(&vi->completion);

+   while ((req = virtqueue_get_buf(vq, &len)))
+   complete(&req->completion);

Instead of adding a completion for each request and using only the
last one, maybe we can do this instead here:

while ((req = virtqueue_get_buf(vq, &len))) {
 if (req->out_hdr.flags == 
cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT))



Is this for the last one check ? For the last one, this bit should be 
cleared, right ?




 complete(&vi->completion);
 }

Since we already know which is the last one, we can also check at this
point if buffers for all other requests are received or not.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 2/3] tty: hvc: pass DMA capable memory to put_chars()

2021-10-20 Thread Greg KH
A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Wed, Oct 20, 2021 at 04:47:23PM +0800, Xianting Tian wrote:
> hi Greg,
> 
> Could I get  your comments of this new version patches? thanks

It has been less than 5 days.  Please relax, and only worry after 2
weeks have gone by.  We have lots of patches to review.  To help
maintainers out, why don't you review other patches on the mailing lists
as well while you wait?

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ

2021-10-20 Thread Jason Wang
On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Oct 20, 2021 at 4:07 AM Jason Wang  wrote:
> >
> > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang  wrote:
> > >
> > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> > >  wrote:
> > > >
> > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang  wrote:
> > > > >
> > > > >
> > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > Use translations added in VhostIOVATree in SVQ.
> > > > > >
> > > > > > Now every element needs to store the previous address also, so 
> > > > > > VirtQueue
> > > > > > can consume the elements properly. This adds a little overhead per 
> > > > > > VQ
> > > > > > element, having to allocate more memory to stash them. As a possible
> > > > > > optimization, this allocation could be avoided if the descriptor is 
> > > > > > not
> > > > > > a chain but a single one, but this is left undone.
> > > > > >
> > > > > > TODO: iova range should be queried before, and add logic to fail 
> > > > > > when
> > > > > > GPA is outside of its range and memory listener or svq add it.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > ---
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h |   4 +-
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 130 
> > > > > > -
> > > > > >   hw/virtio/vhost-vdpa.c |  40 -
> > > > > >   hw/virtio/trace-events |   1 +
> > > > > >   4 files changed, 152 insertions(+), 23 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > > > > > b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > index b7baa424a7..a0e6b5267a 100644
> > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > @@ -11,6 +11,7 @@
> > > > > >   #define VHOST_SHADOW_VIRTQUEUE_H
> > > > > >
> > > > > >   #include "hw/virtio/vhost.h"
> > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > >
> > > > > >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > > >
> > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, 
> > > > > > unsigned idx,
> > > > > >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > >   VhostShadowVirtqueue *svq);
> > > > > >
> > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int 
> > > > > > idx);
> > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > +VhostIOVATree *iova_map);
> > > > > >
> > > > > >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > > >
> > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > > > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > index 2fd0bab75d..9db538547e 100644
> > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > @@ -11,12 +11,19 @@
> > > > > >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > > >   #include "hw/virtio/vhost.h"
> > > > > >   #include "hw/virtio/virtio-access.h"
> > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > >
> > > > > >   #include "standard-headers/linux/vhost_types.h"
> > > > > >
> > > > > >   #include "qemu/error-report.h"
> > > > > >   #include "qemu/main-loop.h"
> > > > > >
> > > > > > +typedef struct SVQElement {
> > > > > > +VirtQueueElement elem;
> > > > > > +void **in_sg_stash;
> > > > > > +void **out_sg_stash;
> > > > > > +} SVQElement;
> > > > > > +
> > > > > >   /* Shadow virtqueue to relay notifications */
> > > > > >   typedef struct VhostShadowVirtqueue {
> > > > > >   /* Shadow vring */
> > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > > >   /* Virtio device */
> > > > > >   VirtIODevice *vdev;
> > > > > >
> > > > > > +/* IOVA mapping if used */
> > > > > > +VhostIOVATree *iova_map;
> > > > > > +
> > > > > >   /* Map for returning guest's descriptors */
> > > > > > -VirtQueueElement **ring_id_maps;
> > > > > > +SVQElement **ring_id_maps;
> > > > > >
> > > > > >   /* Next head to expose to device */
> > > > > >   uint16_t avail_idx_shadow;
> > > > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t 
> > > > > > *dev_features)
> > > > > >   continue;
> > > > > >
> > > > > >   case VIRTIO_F_ACCESS_PLATFORM:
> > > > > > -/* SVQ needs this feature disabled. Can't continue */
> > > > > > -if (*dev_features & BIT_ULL(b)) {
> > > > > > -clear_bit(b, dev_features);
> > > > > > -r = false;
> > > > > > -}
> > > > > > -break;
> > > > > > -
> > > > > >   case VIRTIO_F_VERSION_1:
> > > > > >   /* SVQ needs this feature, so can't continue */
> > > > > >   if (!(*dev_features & BIT_ULL(b))) {
> > > > > > @@ -126,6 +129,64 @@ static void 
> > > > > > vhost_svq_set_notification(VhostShadowVirtqueue *svq, b

Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Viresh Kumar
On 20-10-21, 16:54, Jie Deng wrote:
> 
> On 2021/10/19 16:22, Viresh Kumar wrote:
> > On 19-10-21, 09:46, Vincent Whitchurch wrote:
> > >   static void virtio_i2c_msg_done(struct virtqueue *vq)
> > >   {
> > > - struct virtio_i2c *vi = vq->vdev->priv;
> > > + struct virtio_i2c_req *req;
> > > + unsigned int len;
> > > - complete(&vi->completion);
> > > + while ((req = virtqueue_get_buf(vq, &len)))
> > > + complete(&req->completion);
> > Instead of adding a completion for each request and using only the
> > last one, maybe we can do this instead here:
> > 
> > while ((req = virtqueue_get_buf(vq, &len))) {
> >  if (req->out_hdr.flags == 
> > cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT))
> 
> 
> Is this for the last one check ? For the last one, this bit should be
> cleared, right ?

Oops, you are right. This should be `!=` instead. Thanks.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V6] gpio: virtio: Add IRQ support

2021-10-20 Thread Viresh Kumar
This patch adds IRQ support for the virtio GPIO driver. Note that this
uses the irq_bus_lock/unlock() callbacks, since those operations over
virtio may sleep.

Reviewed-by: Linus Walleij 
Signed-off-by: Viresh Kumar 
---
Bartosz,

The spec changes are close to merging now, I will let you know once the ballot
is closed and the spec changes are merged. You can then pick this patch for
5.16.

V5->V6:
- Sent it separately as the other patches are already merged.
- Queue the buffers only after enabling the irq (as per latest spec).
- Migrate to raw_spinlock_t.

 drivers/gpio/Kconfig |   1 +
 drivers/gpio/gpio-virtio.c   | 310 ++-
 include/uapi/linux/virtio_gpio.h |  25 +++
 3 files changed, 332 insertions(+), 4 deletions(-)

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index fae5141251e5..bfa723ff8e7c 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -1674,6 +1674,7 @@ config GPIO_MOCKUP
 config GPIO_VIRTIO
tristate "VirtIO GPIO support"
depends on VIRTIO
+   select GPIOLIB_IRQCHIP
help
  Say Y here to enable guest support for virtio-based GPIO controllers.
 
diff --git a/drivers/gpio/gpio-virtio.c b/drivers/gpio/gpio-virtio.c
index d24f1c9264bc..73fbe2eda4b9 100644
--- a/drivers/gpio/gpio-virtio.c
+++ b/drivers/gpio/gpio-virtio.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,12 +29,30 @@ struct virtio_gpio_line {
unsigned int rxlen;
 };
 
+struct vgpio_irq_line {
+   u8 type;
+   bool disabled;
+   bool masked;
+   bool queued;
+   bool update_pending;
+   bool queue_pending;
+
+   struct virtio_gpio_irq_request ireq cacheline_aligned;
+   struct virtio_gpio_irq_response ires cacheline_aligned;
+};
+
 struct virtio_gpio {
struct virtio_device *vdev;
struct mutex lock; /* Protects virtqueue operation */
struct gpio_chip gc;
struct virtio_gpio_line *lines;
struct virtqueue *request_vq;
+
+   /* irq support */
+   struct virtqueue *event_vq;
+   struct mutex irq_lock; /* Protects irq operation */
+   raw_spinlock_t eventq_lock; /* Protects queuing of the buffer */
+   struct vgpio_irq_line *irq_lines;
 };
 
 static int _virtio_gpio_req(struct virtio_gpio *vgpio, u16 type, u16 gpio,
@@ -186,6 +205,244 @@ static void virtio_gpio_set(struct gpio_chip *gc, 
unsigned int gpio, int value)
virtio_gpio_req(vgpio, VIRTIO_GPIO_MSG_SET_VALUE, gpio, value, NULL);
 }
 
+/* Interrupt handling */
+static void virtio_gpio_irq_prepare(struct virtio_gpio *vgpio, u16 gpio)
+{
+   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[gpio];
+   struct virtio_gpio_irq_request *ireq = &irq_line->ireq;
+   struct virtio_gpio_irq_response *ires = &irq_line->ires;
+   struct scatterlist *sgs[2], req_sg, res_sg;
+   int ret;
+
+   if (WARN_ON(irq_line->queued || irq_line->masked || irq_line->disabled))
+   return;
+
+   ireq->gpio = cpu_to_le16(gpio);
+   sg_init_one(&req_sg, ireq, sizeof(*ireq));
+   sg_init_one(&res_sg, ires, sizeof(*ires));
+   sgs[0] = &req_sg;
+   sgs[1] = &res_sg;
+
+   ret = virtqueue_add_sgs(vgpio->event_vq, sgs, 1, 1, irq_line, 
GFP_ATOMIC);
+   if (ret) {
+   dev_err(&vgpio->vdev->dev, "failed to add request to eventq\n");
+   return;
+   }
+
+   irq_line->queued = true;
+   virtqueue_kick(vgpio->event_vq);
+}
+
+static void virtio_gpio_irq_enable(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
+   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
+
+   raw_spin_lock(&vgpio->eventq_lock);
+   irq_line->disabled = false;
+   irq_line->masked = false;
+   irq_line->queue_pending = true;
+   raw_spin_unlock(&vgpio->eventq_lock);
+
+   irq_line->update_pending = true;
+}
+
+static void virtio_gpio_irq_disable(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
+   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
+
+   raw_spin_lock(&vgpio->eventq_lock);
+   irq_line->disabled = true;
+   irq_line->masked = true;
+   irq_line->queue_pending = false;
+   raw_spin_unlock(&vgpio->eventq_lock);
+
+   irq_line->update_pending = true;
+}
+
+static void virtio_gpio_irq_mask(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
+   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
+
+   raw_spin_lock(&vgpio->eventq_lock);
+   irq_line->masked = true;
+   raw_spin_unlock(&vgpio->eventq_lock);
+}
+
+static void virtio_gpio_irq_unmask(struct irq_data *d)
+{
+   struct gpio

Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Vincent Whitchurch
On Wed, Oct 20, 2021 at 11:17:21AM +0200, Viresh Kumar wrote:
> On 20-10-21, 16:54, Jie Deng wrote:
> > 
> > On 2021/10/19 16:22, Viresh Kumar wrote:
> > > On 19-10-21, 09:46, Vincent Whitchurch wrote:
> > > >   static void virtio_i2c_msg_done(struct virtqueue *vq)
> > > >   {
> > > > -   struct virtio_i2c *vi = vq->vdev->priv;
> > > > +   struct virtio_i2c_req *req;
> > > > +   unsigned int len;
> > > > -   complete(&vi->completion);
> > > > +   while ((req = virtqueue_get_buf(vq, &len)))
> > > > +   complete(&req->completion);
> > > Instead of adding a completion for each request and using only the
> > > last one, maybe we can do this instead here:
> > > 
> > >   while ((req = virtqueue_get_buf(vq, &len))) {
> > >  if (req->out_hdr.flags == 
> > > cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT))
> > 
> > 
> > Is this for the last one check ? For the last one, this bit should be
> > cleared, right ?
> 
> Oops, you are right. This should be `!=` instead. Thanks.

I don't quite understand how that would be safe since
virtqueue_add_sgs() can fail after a few iterations and all queued
request buffers can have FAIL_NEXT set.  In such a case, we would end up
waiting forever with your proposed change, wouldn't we?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Viresh Kumar
On 20-10-21, 12:38, Vincent Whitchurch wrote:
> I don't quite understand how that would be safe since
> virtqueue_add_sgs() can fail after a few iterations and all queued
> request buffers can have FAIL_NEXT set.  In such a case, we would end up
> waiting forever with your proposed change, wouldn't we?

Good point. I didn't think of that earlier.

I think a good simple way of handling this is counting the number of
buffers sent and received. Once they match, we are done. That
shouldn't break anything else I believe.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-20 Thread Vincent Whitchurch
On Wed, Oct 20, 2021 at 09:04:45AM +0200, Jie Deng wrote:
> On 2021/10/20 14:41, Viresh Kumar wrote:
> > On 20-10-21, 14:35, Jie Deng wrote:
> >> Yes, but we need to know what's the best value to be configured for a
> >> specific "other side".
> >>
> >> I think the "other side" should be more aware of what value is reasonable 
> >> to
> >> be used.
> > If we _really_ need that, then it would require an update to the
> > specification first.
> >
> > I am not sure if the other side is the only party in play here. It
> > depends on the entire setup and not just the hardware device.
> > Specially with virtualisation things become really slow because of
> > context switches, etc. It may be better for the guest userspace to
> > decide on the value.
> >
> > Since this is specially for virtualisation, the kernel may set the
> > value to few HZ by default, lets say 10 (Yeah its really high) :).
> 
> I'm OK with this way for the simplicity.

That would not be safe.  Userspace can change this timeout and the end
result with the current implementation of the driver is potentially
kernel memory corruption, which is something userspace of course never
should be able to trigger.

Even if the timeout were hardcoded in the driver and the driver made to
ignore what userspace requests, it would need to be set to a
ridiculously high value so that we can guarantee that the timeout never
ever occurs (since the result is potentially random kernel memory
corruption).  Which is basically equivalent to disabling the timeout
entirely as my patch does.

If the timeout cannot be disabled, then the driver should be fixed to
always copy buffers and hold on to them to avoid memory corruption in
the case of timeout, as I mentioned in my commit message.  That would be
quite a substantial change to the driver so it's not something I'm
personally comfortable with doing, especially not this late in the -rc
cycle, so I'd leave that to others.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-20 Thread Viresh Kumar
On 20-10-21, 12:55, Vincent Whitchurch wrote:
> If the timeout cannot be disabled, then the driver should be fixed to
> always copy buffers and hold on to them to avoid memory corruption in
> the case of timeout, as I mentioned in my commit message.  That would be
> quite a substantial change to the driver so it's not something I'm
> personally comfortable with doing, especially not this late in the -rc
> cycle, so I'd leave that to others.

Or we can avoid clearing up and freeing the buffers here until the
point where the buffers are returned by the host. Until that happens,
we can avoid taking new requests but return to the earlier caller with
timeout failure. That would avoid corruption, by freeing buffers
sooner, and not hanging of the kernel.

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 0/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
Fix theoretical issues in virtio_ring.

v5:
Update commit message.

v4:
Inside the #2 patch, virtqueue_add_indirect_packed() return -EAGAIN when
desc == NULL.

v3:
Update commit message of the #1 patch.

v2:
Separate the style fix into a single patch.


Xuan Zhuo (2):
  virtio_ring: make virtqueue_add_indirect_packed prettier
  virtio_ring: check desc == NULL when using indirect with packed

 drivers/virtio/virtio_ring.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

--
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 1/2] virtio_ring: make virtqueue_add_indirect_packed prettier

2021-10-20 Thread Xuan Zhuo
Align the arguments of virtqueue_add_indirect_packed() to the open ( to
make it look prettier.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..91a46c4da87d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1050,12 +1050,12 @@ static struct vring_packed_desc 
*alloc_indirect_packed(unsigned int total_sg,
 }
 
 static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
-  struct scatterlist *sgs[],
-  unsigned int total_sg,
-  unsigned int out_sgs,
-  unsigned int in_sgs,
-  void *data,
-  gfp_t gfp)
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs,
+void *data,
+gfp_t gfp)
 {
struct vring_packed_desc *desc;
struct scatterlist *sg;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
When using indirect with packed, we don't check for allocation failures.
This patch checks that and fall back on direct.

Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 91a46c4da87d..552055157b21 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
+   if (!desc)
+   return -EAGAIN;
 
if (unlikely(vq->vq.num_free < 1)) {
pr_debug("Can't add buf len 1 - avail = 0\n");
@@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
unsigned int i, n, c, descs_used, err_idx;
__le16 head_flags, flags;
u16 head, id, prev, curr, avail_used_flags;
+   int err;
 
START_USE(vq);
 
@@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 
BUG_ON(total_sg == 0);
 
-   if (virtqueue_use_indirect(_vq, total_sg))
-   return virtqueue_add_indirect_packed(vq, sgs, total_sg,
-   out_sgs, in_sgs, data, gfp);
+   if (virtqueue_use_indirect(_vq, total_sg)) {
+   err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
+   in_sgs, data, gfp);
+   if (err != -EAGAIN)
+   return err;
+
+   /* fall back on direct */
+   }
 
head = vq->packed.next_avail_idx;
avail_used_flags = vq->packed.avail_used_flags;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Michael S. Tsirkin
On Wed, Oct 20, 2021 at 07:05:51PM +0800, Xuan Zhuo wrote:
> When using indirect with packed, we don't check for allocation failures.
> This patch checks that and fall back on direct.
> 
> Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_ring.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 91a46c4da87d..552055157b21 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
>  
>   head = vq->packed.next_avail_idx;
>   desc = alloc_indirect_packed(total_sg, gfp);
> + if (!desc)
> + return -EAGAIN;
>  
>   if (unlikely(vq->vq.num_free < 1)) {
>   pr_debug("Can't add buf len 1 - avail = 0\n");

I still think ENOMEM is better, this way we handle dma map
failures in the same way.

> @@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
>   unsigned int i, n, c, descs_used, err_idx;
>   __le16 head_flags, flags;
>   u16 head, id, prev, curr, avail_used_flags;
> + int err;
>  
>   START_USE(vq);
>  
> @@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct 
> virtqueue *_vq,
>  
>   BUG_ON(total_sg == 0);
>  
> - if (virtqueue_use_indirect(_vq, total_sg))
> - return virtqueue_add_indirect_packed(vq, sgs, total_sg,
> - out_sgs, in_sgs, data, gfp);
> + if (virtqueue_use_indirect(_vq, total_sg)) {
> + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
> + in_sgs, data, gfp);
> + if (err != -EAGAIN)
> + return err;
> +
> + /* fall back on direct */
> + }
>  
>   head = vq->packed.next_avail_idx;
>   avail_used_flags = vq->packed.avail_used_flags;
> -- 
> 2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin  wrote:
> On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote:
> > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin  
> > wrote:
> > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote:
> > > > When using indirect with packed, we don't check for allocation failures.
> > > > This patch checks that and fall back on direct.
> > > >
> > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 13 ++---
> > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 91a46c4da87d..44a03b6e4dc4 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct 
> > > > vring_virtqueue *vq,
> > > >
> > > > head = vq->packed.next_avail_idx;
> > > > desc = alloc_indirect_packed(total_sg, gfp);
> > > > +   if (!desc)
> > > > +   /* fall back on direct */
> > >
> > > this comment belongs in virtqueue_add_packed right after
> > > return.
> > >
> > > > +   return -EAGAIN;
> > > >
> > >
> > > -ENOMEM surely?
> >
> > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, so 
> > we
> > have to choose a different error code.
> >
> > Thanks.
>
> That's exactly the point.  Why would we want to recover from allocation
> failure but not DMA map failure?

>From indirect to direct mode, there is no need to allocate memory, so if we
encounter memory allocation failure, we can fall back from indirect to direct
mode. Memory allocation failure is a temporary problem.

And if you encounter a dma mmap error, this situation should be notified to the
upper layer.

Thanks.

>
> > >
> > > > if (unlikely(vq->vq.num_free < 1)) {
> > > > pr_debug("Can't add buf len 1 - avail = 0\n");
> > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct 
> > > > virtqueue *_vq,
> > > > unsigned int i, n, c, descs_used, err_idx;
> > > > __le16 head_flags, flags;
> > > > u16 head, id, prev, curr, avail_used_flags;
> > > > +   int err;
> > > >
> > > > START_USE(vq);
> > > >
> > > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct 
> > > > virtqueue *_vq,
> > > >
> > > > BUG_ON(total_sg == 0);
> > > >
> > > > -   if (virtqueue_use_indirect(_vq, total_sg))
> > > > -   return virtqueue_add_indirect_packed(vq, sgs, total_sg,
> > > > -   out_sgs, in_sgs, data, gfp);
> > > > +   if (virtqueue_use_indirect(_vq, total_sg)) {
> > > > +   err = virtqueue_add_indirect_packed(vq, sgs, total_sg, 
> > > > out_sgs,
> > > > +   in_sgs, data, gfp);
> > > > +   if (err != -EAGAIN)
> > > > +   return err;
> > > > +   }
> > > >
> > > > head = vq->packed.next_avail_idx;
> > > > avail_used_flags = vq->packed.avail_used_flags;
> > > > --
> > > > 2.31.0
> > >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
When using indirect with packed, we don't check for allocation failures.
This patch checks that and fall back on direct.

Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 91a46c4da87d..bc92a2faa28f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1065,6 +1065,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
+   if (!desc)
+   return -ENOMEM;
 
if (unlikely(vq->vq.num_free < 1)) {
pr_debug("Can't add buf len 1 - avail = 0\n");
@@ -1176,6 +1178,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
unsigned int i, n, c, descs_used, err_idx;
__le16 head_flags, flags;
u16 head, id, prev, curr, avail_used_flags;
+   int err;
 
START_USE(vq);
 
@@ -1191,9 +1194,14 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 
BUG_ON(total_sg == 0);
 
-   if (virtqueue_use_indirect(_vq, total_sg))
-   return virtqueue_add_indirect_packed(vq, sgs, total_sg,
-   out_sgs, in_sgs, data, gfp);
+   if (virtqueue_use_indirect(_vq, total_sg)) {
+   err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
+   in_sgs, data, gfp);
+   if (err != -ENOMEM)
+   return err;
+
+   /* fall back on direct */
+   }
 
head = vq->packed.next_avail_idx;
avail_used_flags = vq->packed.avail_used_flags;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 1/2] virtio_ring: make virtqueue_add_indirect_packed prettier

2021-10-20 Thread Xuan Zhuo
Align the arguments of virtqueue_add_indirect_packed() to the open ( to
make it look prettier.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..91a46c4da87d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1050,12 +1050,12 @@ static struct vring_packed_desc 
*alloc_indirect_packed(unsigned int total_sg,
 }
 
 static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
-  struct scatterlist *sgs[],
-  unsigned int total_sg,
-  unsigned int out_sgs,
-  unsigned int in_sgs,
-  void *data,
-  gfp_t gfp)
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs,
+void *data,
+gfp_t gfp)
 {
struct vring_packed_desc *desc;
struct scatterlist *sg;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v6 0/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
Fix theoretical issues in virtio_ring.

v6:
-EAGAIN => -ENOMEM

v5:
Update commit message.

v4:
Inside the #2 patch, virtqueue_add_indirect_packed() return -EAGAIN when
desc == NULL.

v3:
Update commit message of the #1 patch.

v2:
Separate the style fix into a single patch.

Xuan Zhuo (2):
  virtio_ring: make virtqueue_add_indirect_packed prettier
  virtio_ring: check desc == NULL when using indirect with packed

 drivers/virtio/virtio_ring.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

--
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Michael S. Tsirkin
On Wed, Oct 20, 2021 at 07:07:43PM +0800, Xuan Zhuo wrote:
> On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin  
> wrote:
> > On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote:
> > > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin  
> > > wrote:
> > > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote:
> > > > > When using indirect with packed, we don't check for allocation 
> > > > > failures.
> > > > > This patch checks that and fall back on direct.
> > > > >
> > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 13 ++---
> > > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > b/drivers/virtio/virtio_ring.c
> > > > > index 91a46c4da87d..44a03b6e4dc4 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -1065,6 +1065,9 @@ static int virtqueue_add_indirect_packed(struct 
> > > > > vring_virtqueue *vq,
> > > > >
> > > > >   head = vq->packed.next_avail_idx;
> > > > >   desc = alloc_indirect_packed(total_sg, gfp);
> > > > > + if (!desc)
> > > > > + /* fall back on direct */
> > > >
> > > > this comment belongs in virtqueue_add_packed right after
> > > > return.
> > > >
> > > > > + return -EAGAIN;
> > > > >
> > > >
> > > > -ENOMEM surely?
> > >
> > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap fails, 
> > > so we
> > > have to choose a different error code.
> > >
> > > Thanks.
> >
> > That's exactly the point.  Why would we want to recover from allocation
> > failure but not DMA map failure?
> 
> >From indirect to direct mode, there is no need to allocate memory, so if we
> encounter memory allocation failure, we can fall back from indirect to direct
> mode. Memory allocation failure is a temporary problem.
> 
> And if you encounter a dma mmap error, this situation should be notified to 
> the
> upper layer.
> 
> Thanks.

Why did dma map fail?
A common reason is precisely that we are running
out of buffer space. Using direct which does not
need extra buffer space thus has a chance to work.


> >
> > > >
> > > > >   if (unlikely(vq->vq.num_free < 1)) {
> > > > >   pr_debug("Can't add buf len 1 - avail = 0\n");
> > > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct 
> > > > > virtqueue *_vq,
> > > > >   unsigned int i, n, c, descs_used, err_idx;
> > > > >   __le16 head_flags, flags;
> > > > >   u16 head, id, prev, curr, avail_used_flags;
> > > > > + int err;
> > > > >
> > > > >   START_USE(vq);
> > > > >
> > > > > @@ -1191,9 +1195,12 @@ static inline int virtqueue_add_packed(struct 
> > > > > virtqueue *_vq,
> > > > >
> > > > >   BUG_ON(total_sg == 0);
> > > > >
> > > > > - if (virtqueue_use_indirect(_vq, total_sg))
> > > > > - return virtqueue_add_indirect_packed(vq, sgs, total_sg,
> > > > > - out_sgs, in_sgs, data, gfp);
> > > > > + if (virtqueue_use_indirect(_vq, total_sg)) {
> > > > > + err = virtqueue_add_indirect_packed(vq, sgs, total_sg, 
> > > > > out_sgs,
> > > > > + in_sgs, data, gfp);
> > > > > + if (err != -EAGAIN)
> > > > > + return err;
> > > > > + }
> > > > >
> > > > >   head = vq->packed.next_avail_idx;
> > > > >   avail_used_flags = vq->packed.avail_used_flags;
> > > > > --
> > > > > 2.31.0
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V6] gpio: virtio: Add IRQ support

2021-10-20 Thread Andy Shevchenko
On Wednesday, October 20, 2021, Viresh Kumar 
wrote:

> This patch adds IRQ support for the virtio GPIO driver. Note that this
> uses the irq_bus_lock/unlock() callbacks, since those operations over
> virtio may sleep.
>
> Reviewed-by: Linus Walleij 
> Signed-off-by: Viresh Kumar 
> ---
> Bartosz,
>
> The spec changes are close to merging now, I will let you know once the
> ballot
> is closed and the spec changes are merged. You can then pick this patch for
> 5.16.
>
> V5->V6:
> - Sent it separately as the other patches are already merged.
> - Queue the buffers only after enabling the irq (as per latest spec).
> - Migrate to raw_spinlock_t.
>
>  drivers/gpio/Kconfig |   1 +
>  drivers/gpio/gpio-virtio.c   | 310 ++-
>  include/uapi/linux/virtio_gpio.h |  25 +++
>  3 files changed, 332 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> index fae5141251e5..bfa723ff8e7c 100644
> --- a/drivers/gpio/Kconfig
> +++ b/drivers/gpio/Kconfig
> @@ -1674,6 +1674,7 @@ config GPIO_MOCKUP
>  config GPIO_VIRTIO
> tristate "VirtIO GPIO support"
> depends on VIRTIO
> +   select GPIOLIB_IRQCHIP
> help
>   Say Y here to enable guest support for virtio-based GPIO
> controllers.
>
> diff --git a/drivers/gpio/gpio-virtio.c b/drivers/gpio/gpio-virtio.c
> index d24f1c9264bc..73fbe2eda4b9 100644
> --- a/drivers/gpio/gpio-virtio.c
> +++ b/drivers/gpio/gpio-virtio.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -28,12 +29,30 @@ struct virtio_gpio_line {
> unsigned int rxlen;
>  };
>
> +struct vgpio_irq_line {
> +   u8 type;
> +   bool disabled;
> +   bool masked;
> +   bool queued;
> +   bool update_pending;
> +   bool queue_pending;
> +
> +   struct virtio_gpio_irq_request ireq cacheline_aligned;
> +   struct virtio_gpio_irq_response ires cacheline_aligned;
> +};
> +
>  struct virtio_gpio {
> struct virtio_device *vdev;
> struct mutex lock; /* Protects virtqueue operation */
> struct gpio_chip gc;
> struct virtio_gpio_line *lines;
> struct virtqueue *request_vq;
> +
> +   /* irq support */
> +   struct virtqueue *event_vq;
> +   struct mutex irq_lock; /* Protects irq operation */
> +   raw_spinlock_t eventq_lock; /* Protects queuing of the buffer */
> +   struct vgpio_irq_line *irq_lines;
>  };
>
>  static int _virtio_gpio_req(struct virtio_gpio *vgpio, u16 type, u16 gpio,
> @@ -186,6 +205,244 @@ static void virtio_gpio_set(struct gpio_chip *gc,
> unsigned int gpio, int value)
> virtio_gpio_req(vgpio, VIRTIO_GPIO_MSG_SET_VALUE, gpio, value,
> NULL);
>  }
>
> +/* Interrupt handling */
> +static void virtio_gpio_irq_prepare(struct virtio_gpio *vgpio, u16 gpio)
> +{
> +   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[gpio];
> +   struct virtio_gpio_irq_request *ireq = &irq_line->ireq;
> +   struct virtio_gpio_irq_response *ires = &irq_line->ires;
> +   struct scatterlist *sgs[2], req_sg, res_sg;
> +   int ret;
> +
> +   if (WARN_ON(irq_line->queued || irq_line->masked ||
> irq_line->disabled))
> +   return;
> +
> +   ireq->gpio = cpu_to_le16(gpio);
> +   sg_init_one(&req_sg, ireq, sizeof(*ireq));
> +   sg_init_one(&res_sg, ires, sizeof(*ires));
> +   sgs[0] = &req_sg;
> +   sgs[1] = &res_sg;
> +
> +   ret = virtqueue_add_sgs(vgpio->event_vq, sgs, 1, 1, irq_line,
> GFP_ATOMIC);
> +   if (ret) {
> +   dev_err(&vgpio->vdev->dev, "failed to add request to
> eventq\n");
> +   return;
> +   }
> +
> +   irq_line->queued = true;
> +   virtqueue_kick(vgpio->event_vq);
> +}
> +
> +static void virtio_gpio_irq_enable(struct irq_data *d)
> +{
> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> +   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
> +   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
> +
> +   raw_spin_lock(&vgpio->eventq_lock);
> +   irq_line->disabled = false;
> +   irq_line->masked = false;
> +   irq_line->queue_pending = true;
> +   raw_spin_unlock(&vgpio->eventq_lock);
> +
> +   irq_line->update_pending = true;
> +}
> +
> +static void virtio_gpio_irq_disable(struct irq_data *d)
> +{
> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> +   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
> +   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
> +
> +   raw_spin_lock(&vgpio->eventq_lock);
> +   irq_line->disabled = true;
> +   irq_line->masked = true;
> +   irq_line->queue_pending = false;
> +   raw_spin_unlock(&vgpio->eventq_lock);
> +
> +   irq_line->update_pending = true;
> +}
> +
> +static void virtio_gpio_irq_mask(struct irq_data *d)
> +{
> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);

Re: [PATCH v5 04/16] x86/tdx: Make pages shared in ioremap()

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/8/21 7:36 PM, Kuppuswamy Sathyanarayanan wrote:

From: "Kirill A. Shutemov" 

All ioremap()ed pages that are not backed by normal memory (NONE or
RESERVED) have to be mapped as shared.

Reuse the infrastructure from AMD SEV code.

Note that DMA code doesn't use ioremap() to convert memory to shared as
DMA buffers backed by normal memory. DMA code make buffer shared with
set_memory_decrypted().

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---

Changes since v4:
  * Renamed "protected_guest" to "cc_guest".
  * Replaced use of prot_guest_has() with cc_guest_has()
  * Modified the patch to adapt to latest version cc_guest_has()
changes.

Changes since v3:
  * Rebased on top of Tom Lendacky's protected guest
changes (https://lore.kernel.org/patchwork/cover/1468760/)

Changes since v1:
  * Fixed format issues in commit log.

  arch/x86/include/asm/pgtable.h |  4 
  arch/x86/mm/ioremap.c  |  8 ++--
  include/linux/cc_platform.h| 13 +
  3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..ecefccbdf2e3 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -21,6 +21,10 @@
  #define pgprot_encrypted(prot)__pgprot(__sme_set(pgprot_val(prot)))
  #define pgprot_decrypted(prot)__pgprot(__sme_clr(pgprot_val(prot)))
  
+/* Make the page accesable by VMM for confidential guests */

+#define pgprot_cc_guest(prot) __pgprot(pgprot_val(prot) |  \
+ tdx_shared_mask())


So this is only for TDX guests, so maybe a name that is less generic.

Alternatively, you could create pgprot_private()/pgprot_shared() functions 
that check for SME/SEV or TDX and do the proper thing.


Then you can redefine pgprot_encrypted()/pgprot_decrypted() to the new 
functions?



+
  #ifndef __ASSEMBLY__
  #include 
  #include 
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 026031b3b782..83daa3f8f39c 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -26,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "physaddr.h"
  
@@ -87,8 +89,8 @@ static unsigned int __ioremap_check_ram(struct resource *res)

  }
  
  /*

- * In a SEV guest, NONE and RESERVED should not be mapped encrypted because
- * there the whole memory is already encrypted.
+ * In a SEV or TDX guest, NONE and RESERVED should not be mapped encrypted (or
+ * private in TDX case) because there the whole memory is already encrypted.
   */
  static unsigned int __ioremap_check_encrypted(struct resource *res)
  {
@@ -246,6 +248,8 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long 
size,
prot = PAGE_KERNEL_IO;
if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
prot = pgprot_encrypted(prot);
+   else if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT))
+   prot = pgprot_cc_guest(prot);


And if you do the new functions this could be:

if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
prot = pgprot_private(prot);
else
prot = pgprot_shared(prot);

Thanks,
Tom

  
  	switch (pcm) {

case _PAGE_CACHE_MODE_UC:
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 7728574d7783..edb1d7a2f6af 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -81,6 +81,19 @@ enum cc_attr {
 * Examples include TDX Guest.
 */
CC_ATTR_GUEST_UNROLL_STRING_IO,
+
+   /**
+* @CC_ATTR_GUEST_SHARED_MAPPING_INIT: IO Remapped memory is marked
+* as shared.
+*
+* The platform/OS is running as a guest/virtual machine and
+* initializes all IO remapped memory as shared.
+*
+* Examples include TDX Guest (SEV marks all pages as shared by default
+* so this feature cannot be enabled for it).
+*/
+   CC_ATTR_GUEST_SHARED_MAPPING_INIT,
+
  };
  
  #ifdef CONFIG_ARCH_HAS_CC_PLATFORM



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 01/16] x86/mm: Move force_dma_unencrypted() to common code

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/8/21 7:36 PM, Kuppuswamy Sathyanarayanan wrote:

From: "Kirill A. Shutemov" 

Intel TDX doesn't allow VMM to access guest private memory. Any memory
that is required for communication with VMM must be shared explicitly
by setting the bit in page table entry. After setting the shared bit,
the conversion must be completed with MapGPA hypercall. Details about
MapGPA hypercall can be found in [1], sec 3.2.

The call informs VMM about the conversion between private/shared
mappings. The shared memory is similar to unencrypted memory in AMD
SME/SEV terminology but the underlying process of sharing/un-sharing
the memory is different for Intel TDX guest platform.

SEV assumes that I/O devices can only do DMA to "decrypted" physical
addresses without the C-bit set. In order for the CPU to interact with
this memory, the CPU needs a decrypted mapping. To add this support,
AMD SME code forces force_dma_unencrypted() to return true for
platforms that support AMD SEV feature. It will be used for DMA memory
allocation API to trigger set_memory_decrypted() for platforms that
support AMD SEV feature.

TDX is similar. So, to communicate with I/O devices, related pages need
to be marked as shared. As mentioned above, shared memory in TDX
architecture is similar to decrypted memory in AMD SME/SEV. So similar
to AMD SEV, force_dma_unencrypted() has to forced to return true. This
support is added in other patches in this series.

So move force_dma_unencrypted() out of AMD specific code and call AMD
specific (amd_force_dma_unencrypted()) initialization function from it.
force_dma_unencrypted() will be modified by later patches to include
Intel TDX guest platform specific initialization.

Also, introduce new config option X86_MEM_ENCRYPT_COMMON that has to be
selected by all x86 memory encryption features. This will be selected
by both AMD SEV and Intel TDX guest config options.

This is preparation for TDX changes in DMA code and it has no
functional change.


Can force_dma_unencrypted() be moved to arch/x86/kernel/cc_platform.c, 
instead of creating a new file? It might fit better with patch #6.


Thanks,
Tom



[1] - 
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---

Changes since v4:
  * Removed used we/you from commit log.

Change since v3:
  * None

Changes since v1:
  * Removed sev_active(), sme_active() checks in force_dma_unencrypted().

  arch/x86/Kconfig  |  8 ++--
  arch/x86/include/asm/mem_encrypt_common.h | 18 ++
  arch/x86/mm/Makefile  |  2 ++
  arch/x86/mm/mem_encrypt.c |  3 ++-
  arch/x86/mm/mem_encrypt_common.c  | 17 +
  5 files changed, 45 insertions(+), 3 deletions(-)
  create mode 100644 arch/x86/include/asm/mem_encrypt_common.h
  create mode 100644 arch/x86/mm/mem_encrypt_common.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index af49ad084919..37b27412f52e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1519,16 +1519,20 @@ config X86_CPA_STATISTICS
  helps to determine the effectiveness of preserving large and huge
  page mappings when mapping protections are changed.
  
+config X86_MEM_ENCRYPT_COMMON

+   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select DYNAMIC_PHYSICAL_MASK
+   def_bool n
+
  config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
select DMA_COHERENT_POOL
-   select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
-   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
select ARCH_HAS_CC_PLATFORM
+   select X86_MEM_ENCRYPT_COMMON
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/include/asm/mem_encrypt_common.h 
b/arch/x86/include/asm/mem_encrypt_common.h
new file mode 100644
index ..697bc40a4e3d
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt_common.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_MEM_ENCRYPT_COMMON_H
+#define _ASM_X86_MEM_ENCRYPT_COMMON_H
+
+#include 
+#include 
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+bool amd_force_dma_unencrypted(struct device *dev);
+#else /* CONFIG_AMD_MEM_ENCRYPT */
+static inline bool amd_force_dma_unencrypted(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 5864219221ca..b31cb52bf1bd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -52,6 +52,8 @@ obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTI

Re: [PATCH v5 06/16] x86/tdx: Make DMA pages shared

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote:

From: "Kirill A. Shutemov" 

Just like MKTME, TDX reassigns bits of the physical address for
metadata.  MKTME used several bits for an encryption KeyID. TDX
uses a single bit in guests to communicate whether a physical page
should be protected by TDX as private memory (bit set to 0) or
unprotected and shared with the VMM (bit set to 1).

__set_memory_enc_dec() is now aware about TDX and sets Shared bit
accordingly following with relevant TDX hypercall.

Also, Do TDX_ACCEPT_PAGE on every 4k page after mapping the GPA range
when converting memory to private. Using 4k page size limit is due
to current TDX spec restriction. Also, If the GPA (range) was
already mapped as an active, private page, the host VMM may remove
the private page from the TD by following the “Removing TD Private
Pages” sequence in the Intel TDX-module specification [1] to safely
block the mapping(s), flush the TLB and cache, and remove the
mapping(s).

BUG() if TDX_ACCEPT_PAGE fails (except "previously accepted page" case)
, as the guest is completely hosed if it can't access memory.

[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Ftdx-module-1eas-v0.85.039.pdf&data=04%7C01%7Cthomas.lendacky%40amd.com%7C0e667adf5a4042abce3908d98abd07a8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637693367201703893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UGxQ9xBjWsmev7PetX%2BuS0RChkAXyaH7q6JHO9ZiUtY%3D&reserved=0

Tested-by: Kai Huang 
Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Sean Christopherson 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 



...


diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index f063c885b0a5..119a9056efbb 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -9,9 +9,18 @@
  
  #include 

  #include 
+#include 
  
  /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */

  bool force_dma_unencrypted(struct device *dev)
  {
-   return amd_force_dma_unencrypted(dev);
+   if (cc_platform_has(CC_ATTR_GUEST_TDX) &&
+   cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+   return true;
+
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) ||
+   cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
+   return amd_force_dma_unencrypted(dev);
+
+   return false;


Assuming the original force_dma_unencrypted() function was moved here or 
cc_platform.c, then you shouldn't need any changes. Both SEV and TDX 
require true be returned if CC_ATTR_GUEST_MEM_ENCRYPT returns true. And 
then TDX should never return true for CC_ATTR_HOST_MEM_ENCRYPT.



  }
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 527957586f3c..6c531d5cb5fd 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "../mm_internal.h"
  
@@ -1981,8 +1982,10 @@ int set_memory_global(unsigned long addr, int numpages)

__pgprot(_PAGE_GLOBAL), 0);
  }
  
-static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)

+static int __set_memory_protect(unsigned long addr, int numpages, bool protect)
  {
+   pgprot_t mem_protected_bits, mem_plain_bits;
+   enum tdx_map_type map_type;
struct cpa_data cpa;
int ret;
  
@@ -1997,8 +2000,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)

memset(&cpa, 0, sizeof(cpa));
cpa.vaddr = &addr;
cpa.numpages = numpages;
-   cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
-   cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+
+   if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) {
+   mem_protected_bits = __pgprot(0);
+   mem_plain_bits = pgprot_cc_shared_mask();


How about having generic versions for both shared and private that return 
the proper value for SEV or TDX. Then this remains looking similar to as 
it does now, just replacing the __pgprot() calls with the appropriate 
pgprot_cc_{shared,private}_mask().


Thanks,
Tom


+   } else {
+   mem_protected_bits = __pgprot(_PAGE_ENC);
+   mem_plain_bits = __pgprot(0);
+   }
+
+   if (protect) {
+   cpa.mask_set = mem_protected_bits;
+   cpa.mask_clr = mem_plain_bits;
+   map_type = TDX_MAP_PRIVATE;
+   } else {
+   cpa.mask_set = mem_plain_bits;
+   cpa.mask_clr = mem_protected_bits;
+   map_type = TDX_MAP_SHARED;
+   }
+
cpa.pgd = init_mm.pgd;
  
  	/* Must avoid aliasing mappings in the highmem code */

@@ -2006,9 +2026,17 @@ static int __set_memory_enc_dec(unsigne

Re: [PATCH v5 07/16] x86/kvm: Use bounce buffers for TD guest

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote:

From: "Kirill A. Shutemov" 

Intel TDX doesn't allow VMM to directly access guest private memory.
Any memory that is required for communication with VMM must be shared
explicitly. The same rule applies for any DMA to and from TDX guest.
All DMA pages had to marked as shared pages. A generic way to achieve
this without any changes to device drivers is to use the SWIOTLB
framework.

This method of handling is similar to AMD SEV. So extend this support
for TDX guest as well. Also since there are some common code between
AMD SEV and TDX guest in mem_encrypt_init(), move it to
mem_encrypt_common.c and call AMD specific init function from it

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---

Changes since v4:
  * Replaced prot_guest_has() with cc_guest_has().

Changes since v3:
  * Rebased on top of Tom Lendacky's protected guest
changes (https://lore.kernel.org/patchwork/cover/1468760/)

Changes since v1:
  * Removed sme_me_mask check for amd_mem_encrypt_init() in mem_encrypt_init().

  arch/x86/include/asm/mem_encrypt_common.h |  3 +++
  arch/x86/kernel/tdx.c |  2 ++
  arch/x86/mm/mem_encrypt.c |  5 +
  arch/x86/mm/mem_encrypt_common.c  | 14 ++
  4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt_common.h 
b/arch/x86/include/asm/mem_encrypt_common.h
index 697bc40a4e3d..bc90e565bce4 100644
--- a/arch/x86/include/asm/mem_encrypt_common.h
+++ b/arch/x86/include/asm/mem_encrypt_common.h
@@ -8,11 +8,14 @@
  
  #ifdef CONFIG_AMD_MEM_ENCRYPT

  bool amd_force_dma_unencrypted(struct device *dev);
+void __init amd_mem_encrypt_init(void);
  #else /* CONFIG_AMD_MEM_ENCRYPT */
  static inline bool amd_force_dma_unencrypted(struct device *dev)
  {
return false;
  }
+
+static inline void amd_mem_encrypt_init(void) {}
  #endif /* CONFIG_AMD_MEM_ENCRYPT */
  
  #endif

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 433f366ca25c..ce8e3019b812 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -12,6 +12,7 @@
  #include 
  #include 
  #include  /* force_sig_fault() */
+#include 
  
  /* TDX Module call Leaf IDs */

  #define TDX_GET_INFO  1
@@ -577,6 +578,7 @@ void __init tdx_early_init(void)
pv_ops.irq.halt = tdx_halt;
  
  	legacy_pic = &null_legacy_pic;

+   swiotlb_force = SWIOTLB_FORCE;
  
  	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "tdx:cpu_hotplug",

  NULL, tdx_cpu_offline_prepare);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5d7fbed73949..8385bc4565e9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -438,14 +438,11 @@ static void print_mem_encrypt_feature_info(void)
  }
  
  /* Architecture __weak replacement functions */

-void __init mem_encrypt_init(void)
+void __init amd_mem_encrypt_init(void)
  {
if (!sme_me_mask)
return;
  
-	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */

-   swiotlb_update_mem_attributes();
-
/*
 * With SEV, we need to unroll the rep string I/O instructions,
 * but SEV-ES supports them through the #VC handler.
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index 119a9056efbb..6fe44c6cb753 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  
  /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */

  bool force_dma_unencrypted(struct device *dev)
@@ -24,3 +25,16 @@ bool force_dma_unencrypted(struct device *dev)
  
  	return false;

  }
+
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+   /*
+* For TDX guest or SEV/SME, call into SWIOTLB to update
+* the SWIOTLB DMA buffers
+*/
+   if (sme_me_mask || cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))


Can't you just make this:

if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))

SEV will return true if sme_me_mask is not zero and TDX should only return 
true if it is TDX guest, right?


Thanks,
Tom


+   swiotlb_update_mem_attributes();
+
+   amd_mem_encrypt_init();
+}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 06/16] x86/tdx: Make DMA pages shared

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/20/21 11:45 AM, Sathyanarayanan Kuppuswamy wrote:

On 10/20/21 9:33 AM, Tom Lendacky wrote:

On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote:


...


  bool force_dma_unencrypted(struct device *dev)
  {
-    return amd_force_dma_unencrypted(dev);
+    if (cc_platform_has(CC_ATTR_GUEST_TDX) &&
+    cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+    return true;
+
+    if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) ||
+    cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
+    return amd_force_dma_unencrypted(dev);
+
+    return false;


Assuming the original force_dma_unencrypted() function was moved here or 
cc_platform.c, then you shouldn't need any changes. Both SEV and TDX 
require true be returned if CC_ATTR_GUEST_MEM_ENCRYPT returns true. And 
then TDX should never return true for CC_ATTR_HOST_MEM_ENCRYPT.



For non TDX case, in CC_ATTR_HOST_MEM_ENCRYPT, we should still call
amd_force_dma_unencrypted() right?


What I'm saying is that you wouldn't have amd_force_dma_unencrypted(). I 
think the whole force_dma_unencrypted() can exist as-is in a different 
file, whether that's cc_platform.c or mem_encrypt_common.c.


It will return true for an SEV or TDX guest, true for an SME host based on 
the DMA mask or else false. That should work just fine for TDX.


Thanks,
Tom






  }
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 527957586f3c..6c531d5cb5fd 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "../mm_internal.h"
@@ -1981,8 +1982,10 @@ int set_memory_global(unsigned long addr, int 
numpages)

  __pgprot(_PAGE_GLOBAL), 0);
  }
-static int __set_memory_enc_dec(unsigned long addr, int numpages, bool 
enc)
+static int __set_memory_protect(unsigned long addr, int numpages, bool 
protect)

  {
+    pgprot_t mem_protected_bits, mem_plain_bits;
+    enum tdx_map_type map_type;
  struct cpa_data cpa;
  int ret;
@@ -1997,8 +2000,25 @@ static int __set_memory_enc_dec(unsigned long 
addr, int numpages, bool enc)

  memset(&cpa, 0, sizeof(cpa));
  cpa.vaddr = &addr;
  cpa.numpages = numpages;
-    cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
-    cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+
+    if (cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT)) {
+    mem_protected_bits = __pgprot(0);
+    mem_plain_bits = pgprot_cc_shared_mask();


How about having generic versions for both shared and private that 
return the proper value for SEV or TDX. Then this remains looking 
similar to as it does now, just replacing the __pgprot() calls with the 
appropriate pgprot_cc_{shared,private}_mask().


Makes sense.



Thanks,
Tom


+    } else {
+    mem_protected_bits = __pgprot(_PAGE_ENC);
+    mem_plain_bits = __pgprot(0);
+    }
+
+    if (protect) {
+    cpa.mask_set = mem_protected_bits;
+    cpa.mask_clr = mem_plain_bits;
+    map_type = TDX_MAP_PRIVATE;
+    } else {
+    cpa.mask_set = mem_plain_bits;
+    cpa.mask_clr = mem_protected_bits;
+    map_type = TDX_MAP_SHARED;
+    }
+
  cpa.pgd = init_mm.pgd;
  /* Must avoid aliasing mappings in the highmem code */
@@ -2006,9 +2026,17 @@ static int __set_memory_enc_dec(unsigned long 
addr, int numpages, bool enc)

  vm_unmap_aliases();
  /*
- * Before changing the encryption attribute, we need to flush caches.
+ * Before changing the encryption attribute, flush caches.
+ *
+ * For TDX, guest is responsible for flushing caches on 
private->shared

+ * transition. VMM is responsible for flushing on shared->private.
   */
-    cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
+    if (cc_platform_has(CC_ATTR_GUEST_TDX)) {
+    if (map_type == TDX_MAP_SHARED)
+    cpa_flush(&cpa, 1);
+    } else {
+    cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
+    }
  ret = __change_page_attr_set_clr(&cpa, 1);
@@ -2021,18 +2049,21 @@ static int __set_memory_enc_dec(unsigned long 
addr, int numpages, bool enc)

   */
  cpa_flush(&cpa, 0);
+    if (!ret && cc_platform_has(CC_ATTR_GUEST_SHARED_MAPPING_INIT))
+    ret = tdx_hcall_gpa_intent(__pa(addr), numpages, map_type);
+
  return ret;
  }
  int set_memory_encrypted(unsigned long addr, int numpages)
  {
-    return __set_memory_enc_dec(addr, numpages, true);
+    return __set_memory_protect(addr, numpages, true);
  }
  EXPORT_SYMBOL_GPL(set_memory_encrypted);
  int set_memory_decrypted(unsigned long addr, int numpages)
  {
-    return __set_memory_enc_dec(addr, numpages, false);
+    return __set_memory_protect(addr, numpages, false);
  }
  EXPORT_SYMBOL_GPL(set_memory_decrypted);




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 07/16] x86/kvm: Use bounce buffers for TD guest

2021-10-20 Thread Tom Lendacky via Virtualization

On 10/20/21 11:50 AM, Sathyanarayanan Kuppuswamy wrote:



On 10/20/21 9:39 AM, Tom Lendacky wrote:

On 10/8/21 7:37 PM, Kuppuswamy Sathyanarayanan wrote:

From: "Kirill A. Shutemov" 

Intel TDX doesn't allow VMM to directly access guest private memory.
Any memory that is required for communication with VMM must be shared
explicitly. The same rule applies for any DMA to and from TDX guest.
All DMA pages had to marked as shared pages. A generic way to achieve
this without any changes to device drivers is to use the SWIOTLB
framework.

This method of handling is similar to AMD SEV. So extend this support
for TDX guest as well. Also since there are some common code between
AMD SEV and TDX guest in mem_encrypt_init(), move it to
mem_encrypt_common.c and call AMD specific init function from it

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 


---

Changes since v4:
  * Replaced prot_guest_has() with cc_guest_has().

Changes since v3:
  * Rebased on top of Tom Lendacky's protected guest
    changes 
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fpatchwork%2Fcover%2F1468760%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cad852703670a44b1e29108d993e9dcc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703454904800065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lXwd5%2Fhnmd5QYaIElQ%2BtVT%2B62JHq%2Bimno5VIjTILaig%3D&reserved=0) 



Changes since v1:
  * Removed sme_me_mask check for amd_mem_encrypt_init() in 
mem_encrypt_init().


  arch/x86/include/asm/mem_encrypt_common.h |  3 +++
  arch/x86/kernel/tdx.c |  2 ++
  arch/x86/mm/mem_encrypt.c |  5 +
  arch/x86/mm/mem_encrypt_common.c  | 14 ++
  4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt_common.h 
b/arch/x86/include/asm/mem_encrypt_common.h

index 697bc40a4e3d..bc90e565bce4 100644
--- a/arch/x86/include/asm/mem_encrypt_common.h
+++ b/arch/x86/include/asm/mem_encrypt_common.h
@@ -8,11 +8,14 @@
  #ifdef CONFIG_AMD_MEM_ENCRYPT
  bool amd_force_dma_unencrypted(struct device *dev);
+void __init amd_mem_encrypt_init(void);
  #else /* CONFIG_AMD_MEM_ENCRYPT */
  static inline bool amd_force_dma_unencrypted(struct device *dev)
  {
  return false;
  }
+
+static inline void amd_mem_encrypt_init(void) {}
  #endif /* CONFIG_AMD_MEM_ENCRYPT */
  #endif
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 433f366ca25c..ce8e3019b812 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -12,6 +12,7 @@
  #include 
  #include 
  #include  /* force_sig_fault() */
+#include 
  /* TDX Module call Leaf IDs */
  #define TDX_GET_INFO    1
@@ -577,6 +578,7 @@ void __init tdx_early_init(void)
  pv_ops.irq.halt = tdx_halt;
  legacy_pic = &null_legacy_pic;
+    swiotlb_force = SWIOTLB_FORCE;
  cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "tdx:cpu_hotplug",
    NULL, tdx_cpu_offline_prepare);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5d7fbed73949..8385bc4565e9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -438,14 +438,11 @@ static void print_mem_encrypt_feature_info(void)
  }
  /* Architecture __weak replacement functions */
-void __init mem_encrypt_init(void)
+void __init amd_mem_encrypt_init(void)
  {
  if (!sme_me_mask)
  return;
-    /* Call into SWIOTLB to update the SWIOTLB DMA buffers */
-    swiotlb_update_mem_attributes();
-
  /*
   * With SEV, we need to unroll the rep string I/O instructions,
   * but SEV-ES supports them through the #VC handler.
diff --git a/arch/x86/mm/mem_encrypt_common.c 
b/arch/x86/mm/mem_encrypt_common.c

index 119a9056efbb..6fe44c6cb753 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  /* Override for DMA direct allocation check - 
ARCH_HAS_FORCE_DMA_UNENCRYPTED */

  bool force_dma_unencrypted(struct device *dev)
@@ -24,3 +25,16 @@ bool force_dma_unencrypted(struct device *dev)
  return false;
  }
+
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+    /*
+ * For TDX guest or SEV/SME, call into SWIOTLB to update
+ * the SWIOTLB DMA buffers
+ */
+    if (sme_me_mask || cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))


Can't you just make this:

 if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))

SEV will return true if sme_me_mask is not zero and TDX should only 
return true if it is TDX guest, right?


Yes. It can be simplified.

But where shall we leave this function cc_platform.c or here?


Either one works...  all depends on how the maintainers feel about 
creating/using mem_encrypt_common.c or using cc_platform.c.


Thanks,
Tom





Thanks,
Tom


+    swiotlb_update_mem_attributes();
+
+

Re: [PATCH v4 2/2] virtio_ring: check desc == NULL when using indirect with packed

2021-10-20 Thread Xuan Zhuo
On Wed, 20 Oct 2021 10:52:24 -0400, Michael S. Tsirkin  wrote:
> On Wed, Oct 20, 2021 at 07:07:43PM +0800, Xuan Zhuo wrote:
> > On Wed, 20 Oct 2021 04:24:33 -0400, Michael S. Tsirkin  
> > wrote:
> > > On Wed, Oct 20, 2021 at 10:19:46AM +0800, Xuan Zhuo wrote:
> > > > On Tue, 19 Oct 2021 09:21:51 -0400, Michael S. Tsirkin 
> > > >  wrote:
> > > > > On Tue, Oct 19, 2021 at 07:52:35PM +0800, Xuan Zhuo wrote:
> > > > > > When using indirect with packed, we don't check for allocation 
> > > > > > failures.
> > > > > > This patch checks that and fall back on direct.
> > > > > >
> > > > > > Fixes: 1ce9e6055fa ("virtio_ring: introduce packed ring support")
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > ---
> > > > > >  drivers/virtio/virtio_ring.c | 13 ++---
> > > > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > > b/drivers/virtio/virtio_ring.c
> > > > > > index 91a46c4da87d..44a03b6e4dc4 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -1065,6 +1065,9 @@ static int 
> > > > > > virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > > > >
> > > > > > head = vq->packed.next_avail_idx;
> > > > > > desc = alloc_indirect_packed(total_sg, gfp);
> > > > > > +   if (!desc)
> > > > > > +   /* fall back on direct */
> > > > >
> > > > > this comment belongs in virtqueue_add_packed right after
> > > > > return.
> > > > >
> > > > > > +   return -EAGAIN;
> > > > > >
> > > > >
> > > > > -ENOMEM surely?
> > > >
> > > > virtqueue_add_indirect_packed() will return -ENOMEM when dma mmap 
> > > > fails, so we
> > > > have to choose a different error code.
> > > >
> > > > Thanks.
> > >
> > > That's exactly the point.  Why would we want to recover from allocation
> > > failure but not DMA map failure?
> >
> > >From indirect to direct mode, there is no need to allocate memory, so if we
> > encounter memory allocation failure, we can fall back from indirect to 
> > direct
> > mode. Memory allocation failure is a temporary problem.
> >
> > And if you encounter a dma mmap error, this situation should be notified to 
> > the
> > upper layer.
> >
> > Thanks.
>
> Why did dma map fail?
> A common reason is precisely that we are running
> out of buffer space. Using direct which does not
> need extra buffer space thus has a chance to work.

It turns out that this is the case, then I have no problem.

Thanks.

>
>
> > >
> > > > >
> > > > > > if (unlikely(vq->vq.num_free < 1)) {
> > > > > > pr_debug("Can't add buf len 1 - avail = 0\n");
> > > > > > @@ -1176,6 +1179,7 @@ static inline int virtqueue_add_packed(struct 
> > > > > > virtqueue *_vq,
> > > > > > unsigned int i, n, c, descs_used, err_idx;
> > > > > > __le16 head_flags, flags;
> > > > > > u16 head, id, prev, curr, avail_used_flags;
> > > > > > +   int err;
> > > > > >
> > > > > > START_USE(vq);
> > > > > >
> > > > > > @@ -1191,9 +1195,12 @@ static inline int 
> > > > > > virtqueue_add_packed(struct virtqueue *_vq,
> > > > > >
> > > > > > BUG_ON(total_sg == 0);
> > > > > >
> > > > > > -   if (virtqueue_use_indirect(_vq, total_sg))
> > > > > > -   return virtqueue_add_indirect_packed(vq, sgs, total_sg,
> > > > > > -   out_sgs, in_sgs, data, gfp);
> > > > > > +   if (virtqueue_use_indirect(_vq, total_sg)) {
> > > > > > +   err = virtqueue_add_indirect_packed(vq, sgs, total_sg, 
> > > > > > out_sgs,
> > > > > > +   in_sgs, data, gfp);
> > > > > > +   if (err != -EAGAIN)
> > > > > > +   return err;
> > > > > > +   }
> > > > > >
> > > > > > head = vq->packed.next_avail_idx;
> > > > > > avail_used_flags = vq->packed.avail_used_flags;
> > > > > > --
> > > > > > 2.31.0
> > > > >
> > >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ

2021-10-20 Thread Jason Wang
On Wed, Oct 20, 2021 at 7:57 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Oct 20, 2021 at 11:03 AM Jason Wang  wrote:
> >
> > On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Wed, Oct 20, 2021 at 4:07 AM Jason Wang  wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > > > Use translations added in VhostIOVATree in SVQ.
> > > > > > > >
> > > > > > > > Now every element needs to store the previous address also, so 
> > > > > > > > VirtQueue
> > > > > > > > can consume the elements properly. This adds a little overhead 
> > > > > > > > per VQ
> > > > > > > > element, having to allocate more memory to stash them. As a 
> > > > > > > > possible
> > > > > > > > optimization, this allocation could be avoided if the 
> > > > > > > > descriptor is not
> > > > > > > > a chain but a single one, but this is left undone.
> > > > > > > >
> > > > > > > > TODO: iova range should be queried before, and add logic to 
> > > > > > > > fail when
> > > > > > > > GPA is outside of its range and memory listener or svq add it.
> > > > > > > >
> > > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > > > ---
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |   4 +-
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 130 
> > > > > > > > -
> > > > > > > >   hw/virtio/vhost-vdpa.c |  40 -
> > > > > > > >   hw/virtio/trace-events |   1 +
> > > > > > > >   4 files changed, 152 insertions(+), 23 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > > > > > > > b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > index b7baa424a7..a0e6b5267a 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > @@ -11,6 +11,7 @@
> > > > > > > >   #define VHOST_SHADOW_VIRTQUEUE_H
> > > > > > > >
> > > > > > > >   #include "hw/virtio/vhost.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > > > > >
> > > > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, 
> > > > > > > > unsigned idx,
> > > > > > > >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > >   VhostShadowVirtqueue *svq);
> > > > > > > >
> > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int 
> > > > > > > > idx);
> > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int 
> > > > > > > > idx,
> > > > > > > > +VhostIOVATree *iova_map);
> > > > > > > >
> > > > > > > >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > > > > > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > index 2fd0bab75d..9db538547e 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > @@ -11,12 +11,19 @@
> > > > > > > >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > > > > >   #include "hw/virtio/vhost.h"
> > > > > > > >   #include "hw/virtio/virtio-access.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > >   #include "standard-headers/linux/vhost_types.h"
> > > > > > > >
> > > > > > > >   #include "qemu/error-report.h"
> > > > > > > >   #include "qemu/main-loop.h"
> > > > > > > >
> > > > > > > > +typedef struct SVQElement {
> > > > > > > > +VirtQueueElement elem;
> > > > > > > > +void **in_sg_stash;
> > > > > > > > +void **out_sg_stash;
> > > > > > > > +} SVQElement;
> > > > > > > > +
> > > > > > > >   /* Shadow virtqueue to relay notifications */
> > > > > > > >   typedef struct VhostShadowVirtqueue {
> > > > > > > >   /* Shadow vring */
> > > > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > > > > >   /* Virtio device */
> > > > > > > >   VirtIODevice *vdev;
> > > > > > > >
> > > > > > > > +/* IOVA mapping if used */
> > > > > > > > +VhostIOVATree *iova_map;
> > > > > > > > +
> > > > > > > >   /* Map for returning guest's descriptors */
> > > > > > > > -VirtQueueElement **ring_id_maps;
> > > > > > > > +SVQElement **ring_id_maps;
> > > > > > > >
> > > > > > > >   /* Next head to expose to device */
> > > > > > > >   uint16_t avail_idx_shadow;
> > > > > > > > @@ -79,13 +89,6 @@ bool 
> > > > > > > > vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > > > > >   continue;
> > > > > > > >
> > > > > > > >   case VIRTIO_F_ACCESS_PLATFORM:
> > > > > > >

Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-20 Thread Jie Deng



On 2021/10/20 19:03, Viresh Kumar wrote:

On 20-10-21, 12:55, Vincent Whitchurch wrote:

If the timeout cannot be disabled, then the driver should be fixed to
always copy buffers and hold on to them to avoid memory corruption in
the case of timeout, as I mentioned in my commit message.  That would be
quite a substantial change to the driver so it's not something I'm
personally comfortable with doing, especially not this late in the -rc
cycle, so I'd leave that to others.

Or we can avoid clearing up and freeing the buffers here until the
point where the buffers are returned by the host. Until that happens,
we can avoid taking new requests but return to the earlier caller with
timeout failure. That would avoid corruption, by freeing buffers
sooner, and not hanging of the kernel.



It seems similar to use "wait_for_completion". If the other side is 
hacked, the guest may never


get the buffers returned by the host, right ?


For this moment, we can solve the problem by using a hardcoded big value 
or disabling the timeout.


Over the long term, I think the backend should provide that timeout 
value and guarantee that its processing


time should not exceed that value.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V6] gpio: virtio: Add IRQ support

2021-10-20 Thread Viresh Kumar
On 20-10-21, 18:10, Andy Shevchenko wrote:
> On Wednesday, October 20, 2021, Viresh Kumar 
> wrote:
> > +static int virtio_gpio_irq_set_type(struct irq_data *d, unsigned int
> > type)
> > +{
> > +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> > +   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
> > +   struct vgpio_irq_line *irq_line = &vgpio->irq_lines[d->hwirq];
> > +
> > +   switch (type) {
> > +   case IRQ_TYPE_NONE:
> > +   type = VIRTIO_GPIO_IRQ_TYPE_NONE;
> > +   break;
> 
> 
> IIRC you add dead code. IRQ framework never calls this if type is not set.

Yes, but it is allowed to call

irq_set_irq_type(irq, IRQ_TYPE_NONE);

and the irq framework won't disallow it AFAICT.

> > +static void virtio_gpio_event_vq(struct virtqueue *vq)
> > +{

> > +   irq = irq_find_mapping(vgpio->gc.irq.domain, gpio);
> > +   WARN_ON(!irq);
> > +
> > +   ret = generic_handle_irq(irq);
> 
> 
> IIRC there is a new API that basically combines the two above.

generic_handle_domain_irq(), thanks.

> >  struct virtio_gpio_config {
> > __le16 ngpio;
> > __u8 padding[2];
> > @@ -44,4 +56,17 @@ struct virtio_gpio_response_get_names {
> > __u8 value[];
> >  };
> >
> > +/* Virtio GPIO IRQ Request / Response */
> > +struct virtio_gpio_irq_request {
> > +   __le16 gpio;
> > +};
> > +
> > +struct virtio_gpio_irq_response {
> > +   __u8 status;
> > +};
> >
> >
> I’m wondering if those above should be packed.

You are talking about the newly added ones or the ones before ?

In any case, they are all already packed (i.e. they have explicit
padding wherever required) and properly aligned. Compiler won't add
any other padding to them.
 
-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Jie Deng



On 2021/10/19 15:46, Vincent Whitchurch wrote:

The driver currently assumes that the notify callback is only received
when the device is done with all the queued buffers.

However, this is not true, since the notify callback could be called
without any of the queued buffers being completed (for example, with
virtio-pci and shared interrupts) or with only some of the buffers being
completed (since the driver makes them available to the device in
multiple separate virtqueue_add_sgs() calls).



Can the backend driver control the time point of interrupt injection ? I 
can't think of


why the backend has to send an early interrupt. This operation should be 
avoided


in the backend driver if possible. However, this change make sense if 
early interrupt


can't be avoid.




This can lead to incorrect data on the I2C bus or memory corruption in
the guest if the device operates on buffers which are have been freed by
the driver.  (The WARN_ON in the driver is also triggered.)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Viresh Kumar
On 21-10-21, 13:55, Jie Deng wrote:
> Can the backend driver control the time point of interrupt injection ? I
> can't think of
> 
> why the backend has to send an early interrupt. This operation should be
> avoided
> 
> in the backend driver if possible. However, this change make sense if early
> interrupt
> 
> can't be avoid.

The backend driver probably won't send an event, but the notification
event, if it comes, shouldn't have side effects like what it currently
have (where we finish the ongoing transfer by calling complete()).

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-20 Thread Tian, Kevin
> From: j...@8bytes.org 
> Sent: Monday, October 18, 2021 7:38 PM
> 
> On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > I saw a concept of deferred attach in iommu core. See iommu_is_
> > attach_deferred(). Currently this is vendor specific and I haven't
> > looked into the exact reason why some vendor sets it now. Just
> > be curious whether the same reason might be applied to virtio-iommu.
> 
> The reason for attach_deferred is kdump support, where the IOMMU driver
> needs to keep the mappings from the old kernel until the device driver
> of the new kernel takes over.
> 

ok. Then there is no problem with the original statement:

All endpoints managed by the IOMMU should be attached to a 
domain, so global bypass isn't in use after endpoints are probed.

Thanks
Kevin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-20 Thread Tian, Kevin
> From: Jean-Philippe Brucker 
> Sent: Monday, October 18, 2021 11:24 PM
> 
> On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker 
> > > Sent: Wednesday, October 13, 2021 8:11 PM
> > >
> > > Support identity domains, allowing to only enable IOMMU protection for
> a
> > > subset of endpoints (those assigned to userspace, for example). Users
> > > may enable identity domains at compile time
> > > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> > > (iommu.passthrough=1) or
> > > runtime (/sys/kernel/iommu_groups/*/type = identity).
> >
> > Do we want to use consistent terms between spec (bypass domain)
> > and code (identity domain)?
> 
> I don't think we have to. Linux uses "identity" domains and "passthrough"
> IOMMU. The old virtio-iommu feature was "bypass" so we should keep that
> for the new one, to be consistent. And then I've used "bypass" for domains
> as well, in the spec, because it would look strange to use a different
> term for the same concept. I find that it sort of falls into place: Linux'
> identity domains can be implemented either with bypass or identity-mapped
> virtio-iommu domains.

make sense.

> 
> > >
> > > Patches 1-2 support identity domains using the optional
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in
> the
> > > spec, see [1] for the latest proposal.
> > >
> > > Patches 3-5 add a fallback to identity mappings, when the feature is not
> > > supported.
> > >
> > > Note that this series doesn't touch the global bypass bit added by
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the
> IOMMU
> > > should
> > > be attached to a domain, so global bypass isn't in use after endpoints
> >
> > I saw a concept of deferred attach in iommu core. See iommu_is_
> > attach_deferred(). Currently this is vendor specific and I haven't
> > looked into the exact reason why some vendor sets it now. Just
> > be curious whether the same reason might be applied to virtio-iommu.
> >
> > > are probed. Before that, the global bypass policy is decided by the
> > > hypervisor and firmware. So I don't think Linux needs to touch the
> >
> > This reminds me one thing. The spec says that the global bypass
> > bit is sticky and not affected by reset.
> 
> The spec talks about *device* reset, triggered by software writing 0 to
> the status register, but it doesn't mention system reset. Would be good to
> clarify that. Something like:
> 
> If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it MAY
> initialize the \field{bypass} field to 1. Field \field{bypass} SHOULD
> NOT change on device reset, but SHOULD be restored to its initial
> value on system reset.

looks good to me

> 
> > This implies that in the case
> > of rebooting the VM into a different OS, the previous OS actually
> > has the right to override this setting for the next OS. Is it a right
> > design? Even the firmware itself is unable to identify the original
> > setting enforced by the hypervisor after reboot. I feel the hypervisor
> > setting should be recovered after reset since it reflects the
> > security measure enforced by the virtual platform?
> 
> So I think clarifying system reset should address your questions.
> I believe we should leave bypass sticky across device reset, so a FW->OS
> transition, where the OS resets the device, does not open a vulnerability
> (if bypass was enabled at boot and then left disabled by FW.)
> 
> It's still a good idea for the OS to restore on shutdown the bypass value
> it was booted with. So it can kexec into an OS that doesn't support
> virtio-iommu, for example.
> 

Thanks
Kevin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-20 Thread Tian, Kevin
> From: Jean-Philippe Brucker 
> Sent: Wednesday, October 13, 2021 8:11 PM
> 
> Support identity domains, allowing to only enable IOMMU protection for a
> subset of endpoints (those assigned to userspace, for example). Users
> may enable identity domains at compile time
> (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> (iommu.passthrough=1) or
> runtime (/sys/kernel/iommu_groups/*/type = identity).
> 
> Patches 1-2 support identity domains using the optional
> VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> spec, see [1] for the latest proposal.
> 
> Patches 3-5 add a fallback to identity mappings, when the feature is not
> supported.
> 
> Note that this series doesn't touch the global bypass bit added by
> VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU
> should
> be attached to a domain, so global bypass isn't in use after endpoints
> are probed. Before that, the global bypass policy is decided by the
> hypervisor and firmware. So I don't think Linux needs to touch the
> global bypass bit, but there are some patches available on my
> virtio-iommu/bypass branch [2] to test it.
> 
> QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)
> 
> [1] https://www.mail-archive.com/virtio-dev@lists.oasis-
> open.org/msg07898.html
> [2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
> [3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass
> 
> Jean-Philippe Brucker (5):
>   iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
>   iommu/virtio: Support bypass domains
>   iommu/virtio: Sort reserved regions
>   iommu/virtio: Pass end address to viommu_add_mapping()
>   iommu/virtio: Support identity-mapped domains
> 
>  include/uapi/linux/virtio_iommu.h |   8 ++-
>  drivers/iommu/virtio-iommu.c  | 113 +-
>  2 files changed, 101 insertions(+), 20 deletions(-)
> 

For this series:

Reviewed-by: Kevin Tian 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization