Re: [PATCH v2 09/20] drm/i915: Remove references to struct drm_device.pdev

2020-12-10 Thread Jani Nikula
On Tue, 08 Dec 2020, Thomas Zimmermann  wrote:
> ping for a review of the i915 patches

What did you have in mind regarding merging the series? Should we just
pick the patches up?

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-10 Thread Jason Wang


On 2020/12/9 下午10:00, Eli Cohen wrote:

Make sure to put dma write memory barrier after updating CQ consumer
index so the hardware knows that there are available CQE slots in the
queue.

Failure to do this can cause the update of the RX doorbell record to get
updated before the CQ consumer index resulting in CQ overrun.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
V0 -> V1
Use dma_wmb() instead of wmb()



Acked-by: Jason Wang 




  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index db87abc3cb60..43b0069ff8b1 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -479,6 +479,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
  static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int 
num)
  {
mlx5_cq_set_ci(&mvq->cq.mcq);
+
+   /* make sure CQ cosumer update is visible to the hardware before 
updating
+* RX doorbell record.
+*/
+   dma_wmb();
rx_post(&mvq->vqqp, num);
if (mvq->event_cb.callback)
mvq->event_cb.callback(mvq->event_cb.private);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/27] vDPA software assisted live migration

2020-12-10 Thread Jason Wang


On 2020/12/9 下午11:57, Stefan Hajnoczi wrote:

On Wed, Dec 09, 2020 at 04:26:50AM -0500, Jason Wang wrote:

- Original Message -

On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote:

This series enable vDPA software assisted live migration for vhost-net
devices. This is a new method of vhost devices migration: Instead of
relay on vDPA device's dirty logging capability, SW assisted LM
intercepts dataplane, forwarding the descriptors between VM and device.

Pros:
+ vhost/vDPA devices don't need to implement dirty memory logging
+ Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends

Cons:
- Not generic, relies on vhost-net-specific ioctls
- Doesn't support VIRTIO Shared Memory Regions
   https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex

I may miss something but my understanding is that it's the
responsiblity of device to migrate this part?

Good point. You're right.


- Performance (see below)

I think performance will be significantly lower when the shadow vq is
enabled. Imagine a vDPA device with hardware vq doorbell registers
mapped into the guest so the guest driver can directly kick the device.
When the shadow vq is enabled a vmexit is needed to write to the shadow
vq ioeventfd, then the host kernel scheduler switches to a QEMU thread
to read the ioeventfd, the descriptors are translated, QEMU writes to
the vhost hdev kick fd, the host kernel scheduler switches to the vhost
worker thread, vhost/vDPA notifies the virtqueue, and finally the
vDPA driver writes to the hardware vq doorbell register. That is a lot
of overhead compared to writing to an exitless MMIO register!

I think it's a balance. E.g we can poll the virtqueue to have an
exitless doorbell.


If the shadow vq was implemented in drivers/vhost/ and QEMU used the
existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be
reduced to just one set of ioeventfd/irqfd. In other words, the QEMU
dirty memory logging happens asynchronously and isn't in the dataplane.

In addition, hardware that supports dirty memory logging as well as
software vDPA devices could completely eliminate the shadow vq for even
better performance.

Yes. That's our plan. But the interface might require more thought.

E.g is the bitmap a good approach? To me reporting dirty pages via
virqueue is better since it get less footprint and is self throttled.

And we need an address space other than the one used by guest for
either bitmap for virtqueue.


But performance is a question of "is it good enough?". Maybe this
approach is okay and users don't expect good performance while dirty
memory logging is enabled.

Yes, and actually such slow down may help for the converge of the
migration.

Note that the whole idea is try to have a generic solution for all
types of devices. It's good to consider the performance but for the
first stage, it should be sufficient to make it work and consider to
optimize on top.

Moving the shadow vq to the kernel later would be quite a big change
requiring rewriting much of the code. That's why I mentioned this now
before a lot of effort is invested in a QEMU implementation.



Right.





I just wanted to share the idea of moving the
shadow vq into the kernel in case you like that approach better.

My understanding is to keep kernel as simple as possible and leave the
polices to userspace as much as possible. E.g it requires us to
disable doorbell mapping and irq offloading, all of which were under
the control of userspace.

If the performance is acceptable with the QEMU approach then I think
that's the best place to implement it. It looks high-overhead though so
maybe one of the first things to do is to run benchmarks to collect data
on how it performs?



Yes, I agree.

Thanks




Stefan


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-10 Thread Mark Rutland
On Wed, Dec 09, 2020 at 07:54:26PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> > In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
> >
> > local_irq_save(flags);
> > local_irq_enable();
> >
> > [ trigger an IRQ here ]
> >
> > local_irq_restore(flags);
> >
> > ... and in check_timer() we call that a number of times after either a
> > local_irq_save() or local_irq_disable(), eventually trailing with a
> > local_irq_disable() that will balance things up before calling
> > local_irq_restore().

I gave the patchlet below a spin with my debug patch, and it boots
cleanly for me under QEMU. If you spin it as a real patch, feel free to
add:

Tested-by: Mark Rutland 

Mark.

> ---
>  arch/x86/kernel/apic/io_apic.c |   22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
>  static int __init timer_irq_works(void)
>  {
>   unsigned long t1 = jiffies;
> - unsigned long flags;
>  
>   if (no_timer_check)
>   return 1;
>  
> - local_save_flags(flags);
>   local_irq_enable();
> -
>   if (boot_cpu_has(X86_FEATURE_TSC))
>   delay_with_tsc();
>   else
>   delay_without_tsc();
>  
> - local_irq_restore(flags);
> -
>   /*
>* Expect a few ticks at least, to be sure some possible
>* glue logic does not lock up after one or two first
> @@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
>* least one tick may be lost due to delays.
>*/
>  
> - /* jiffies wrap? */
> - if (time_after(jiffies, t1 + 4))
> - return 1;
> - return 0;
> + local_irq_disable();
> +
> + /* Did jiffies advance? */
> + return time_after(jiffies, t1 + 4);
>  }
>  
>  /*
> @@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
>   struct irq_cfg *cfg = irqd_cfg(irq_data);
>   int node = cpu_to_node(0);
>   int apic1, pin1, apic2, pin2;
> - unsigned long flags;
>   int no_pin1 = 0;
>  
>   if (!global_clock_event)
>   return;
>  
> - local_irq_save(flags);
> + local_irq_disable();
>  
>   /*
>* get/set the timer IRQ vector:
> @@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
>   goto out;
>   }
>   panic_if_irq_remap("timer doesn't work through 
> Interrupt-remapped IO-APIC");
> - local_irq_disable();
>   clear_IO_APIC_pin(apic1, pin1);
>   if (!no_pin1)
>   apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
> @@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
>   /*
>* Cleanup, just in case ...
>*/
> - local_irq_disable();
>   legacy_pic->mask(0);
>   clear_IO_APIC_pin(apic2, pin2);
>   apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
> @@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
>   apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
>   goto out;
>   }
> - local_irq_disable();
>   legacy_pic->mask(0);
>   apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
>   apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
> @@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
>   apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
>   goto out;
>   }
> - local_irq_disable();
>   apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
>   if (apic_is_x2apic_enabled())
>   apic_printk(APIC_QUIET, KERN_INFO
> @@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
>   panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
>   "report.  Then try booting with the 'noapic' option.\n");
>  out:
> - local_irq_restore(flags);
> + local_irq_enable();
>  }
>  
>  /*
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 05/27] vhost: Add hdev->dev.sw_lm_vq_handler

2020-12-10 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 04:02:56PM +0100, Eugenio Perez Martin wrote:
> On Mon, Dec 7, 2020 at 5:52 PM Stefan Hajnoczi  wrote:
> > On Fri, Nov 20, 2020 at 07:50:43PM +0100, Eugenio Pérez wrote:
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 9179013ac4..9a69ae3598 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -2628,24 +2628,32 @@ static void virtio_net_tx_bh(void *opaque)
> > >  }
> > >  }
> > >
> > > -static void virtio_net_add_queue(VirtIONet *n, int index)
> > > +static void virtio_net_add_queue(VirtIONet *n, int index,
> > > + VirtIOHandleOutput custom_handler)
> > >  {
> >
> > We talked about the possibility of moving this into the generic vhost
> > code so that devices don't need to be modified. It would be nice to hide
> > this feature inside vhost.
> 
> I'm thinking of tying it to VirtQueue, allowing the caller to override
> the handler knowing it is not going to be called (I mean, not offering
> race conditions protection, like before of starting processing
> notifications in qemu calling vhost_dev_disable_notifiers).

Yes, I can see how at least part of this belongs to VirtQueue.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 09/20] drm/i915: Remove references to struct drm_device.pdev

2020-12-10 Thread Thomas Zimmermann

Hi

Am 10.12.20 um 10:03 schrieb Jani Nikula:

On Tue, 08 Dec 2020, Thomas Zimmermann  wrote:

ping for a review of the i915 patches


What did you have in mind regarding merging the series? Should we just
pick the patches up?


Originally I thought that individual trees would merge their rsp 
patches. I guess the larger drivers could get some minor conflicts.


Amdgpu/radeon devs suggested to put everything into drm-misc-next at 
once. I'd prefer that, but ultimately it's up to you.


Best regards
Thomas



BR,
Jani.



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 13/20] drm/nouveau: Remove references to struct drm_device.pdev

2020-12-10 Thread Thomas Zimmermann

Hi

Am 09.12.20 um 19:04 schrieb Jeremy Cline:

Hi,

On Tue, Dec 01, 2020 at 11:35:35AM +0100, Thomas Zimmermann wrote:

Using struct drm_device.pdev is deprecated. Convert nouveau to struct
drm_device.dev. No functional changes.

Signed-off-by: Thomas Zimmermann 
Cc: Ben Skeggs 
---
  drivers/gpu/drm/nouveau/dispnv04/arb.c  | 12 +++-
  drivers/gpu/drm/nouveau/dispnv04/disp.h | 14 --
  drivers/gpu/drm/nouveau/dispnv04/hw.c   | 10 ++
  drivers/gpu/drm/nouveau/nouveau_abi16.c |  7 ---
  drivers/gpu/drm/nouveau/nouveau_acpi.c  |  2 +-
  drivers/gpu/drm/nouveau/nouveau_bios.c  | 11 ---
  drivers/gpu/drm/nouveau/nouveau_connector.c | 10 ++
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 ++---
  drivers/gpu/drm/nouveau/nouveau_fbcon.c |  6 --
  drivers/gpu/drm/nouveau/nouveau_vga.c   | 20 
  10 files changed, 58 insertions(+), 39 deletions(-)



I believe there's a use of drm_device.pdev in
drivers/gpu/drm/nouveau/dispnv04/dfp.c in the
nv04_dfp_update_backlight() function.


Oh, I see! It's PowerPC-only.



Other than that, this looks good to me.


With the forgotten pdev fixes, can I count this as an Acked-by?

Best regards
Thomas




diff --git a/drivers/gpu/drm/nouveau/dispnv04/arb.c 
b/drivers/gpu/drm/nouveau/dispnv04/arb.c
index 9d4a2d97507e..1d3542d6006b 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/arb.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/arb.c
@@ -200,16 +200,17 @@ nv04_update_arb(struct drm_device *dev, int VClk, int bpp,
int MClk = nouveau_hw_get_clock(dev, PLL_MEMORY);
int NVClk = nouveau_hw_get_clock(dev, PLL_CORE);
uint32_t cfg1 = nvif_rd32(device, NV04_PFB_CFG1);
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
  
  	sim_data.pclk_khz = VClk;

sim_data.mclk_khz = MClk;
sim_data.nvclk_khz = NVClk;
sim_data.bpp = bpp;
sim_data.two_heads = nv_two_heads(dev);
-   if ((dev->pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
-   (dev->pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
+   if ((pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
+   (pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
uint32_t type;
-   int domain = pci_domain_nr(dev->pdev->bus);
+   int domain = pci_domain_nr(pdev->bus);
  
  		pci_read_config_dword(pci_get_domain_bus_and_slot(domain, 0, 1),

  0x7c, &type);
@@ -251,11 +252,12 @@ void
  nouveau_calc_arb(struct drm_device *dev, int vclk, int bpp, int *burst, int 
*lwm)
  {
struct nouveau_drm *drm = nouveau_drm(dev);
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
  
  	if (drm->client.device.info.family < NV_DEVICE_INFO_V0_KELVIN)

nv04_update_arb(dev, vclk, bpp, burst, lwm);
-   else if ((dev->pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
-(dev->pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
+   else if ((pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
+(pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
*burst = 128;
*lwm = 0x0480;
} else
diff --git a/drivers/gpu/drm/nouveau/dispnv04/disp.h 
b/drivers/gpu/drm/nouveau/dispnv04/disp.h
index 5ace5e906949..f0a24126641a 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/disp.h
+++ b/drivers/gpu/drm/nouveau/dispnv04/disp.h
@@ -130,7 +130,7 @@ static inline bool
  nv_two_heads(struct drm_device *dev)
  {
struct nouveau_drm *drm = nouveau_drm(dev);
-   const int impl = dev->pdev->device & 0x0ff0;
+   const int impl = to_pci_dev(dev->dev)->device & 0x0ff0;
  
  	if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_CELSIUS && impl != 0x0100 &&

impl != 0x0150 && impl != 0x01a0 && impl != 0x0200)
@@ -142,14 +142,14 @@ nv_two_heads(struct drm_device *dev)
  static inline bool
  nv_gf4_disp_arch(struct drm_device *dev)
  {
-   return nv_two_heads(dev) && (dev->pdev->device & 0x0ff0) != 0x0110;
+   return nv_two_heads(dev) && (to_pci_dev(dev->dev)->device & 0x0ff0) != 
0x0110;
  }
  
  static inline bool

  nv_two_reg_pll(struct drm_device *dev)
  {
struct nouveau_drm *drm = nouveau_drm(dev);
-   const int impl = dev->pdev->device & 0x0ff0;
+   const int impl = to_pci_dev(dev->dev)->device & 0x0ff0;
  
  	if (impl == 0x0310 || impl == 0x0340 || drm->client.device.info.family >= NV_DEVICE_INFO_V0_CURIE)

return true;
@@ -160,9 +160,11 @@ static inline bool
  nv_match_device(struct drm_device *dev, unsigned device,
unsigned sub_vendor, unsigned sub_device)
  {
-   return dev->pdev->device == device &&
-   dev->pdev->subsystem_vendor == sub_vendor &&
-   dev->pdev->subsystem_device == sub_device;
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+   return pdev->device == device &&
+   pdev->subsystem

Re: [RFC PATCH 07/27] vhost: Route guest->host notification through qemu

2020-12-10 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 06:08:14PM +0100, Eugenio Perez Martin wrote:
> On Mon, Dec 7, 2020 at 6:42 PM Stefan Hajnoczi  wrote:
> > On Fri, Nov 20, 2020 at 07:50:45PM +0100, Eugenio Pérez wrote:
> > > +{
> > > +struct vhost_vring_file file = {
> > > +.index = idx
> > > +};
> > > +VirtQueue *vq = virtio_get_queue(dev->vdev, idx);
> > > +VhostShadowVirtqueue *svq;
> > > +int r;
> > > +
> > > +svq = g_new0(VhostShadowVirtqueue, 1);
> > > +svq->vq = vq;
> > > +
> > > +r = event_notifier_init(&svq->hdev_notifier, 0);
> > > +assert(r == 0);
> > > +
> > > +file.fd = event_notifier_get_fd(&svq->hdev_notifier);
> > > +r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
> > > +assert(r == 0);
> > > +
> > > +return svq;
> > > +}
> >
> > I guess there are assumptions about the status of the device? Does the
> > virtqueue need to be disabled when this function is called?
> >
> 
> Yes. Maybe an assertion checking the notification state?

Sounds good.

> > > +
> > > +static int vhost_sw_live_migration_stop(struct vhost_dev *dev)
> > > +{
> > > +int idx;
> > > +
> > > +vhost_dev_enable_notifiers(dev, dev->vdev);
> > > +for (idx = 0; idx < dev->nvqs; ++idx) {
> > > +vhost_sw_lm_shadow_vq_free(dev->sw_lm_shadow_vq[idx]);
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > > +static int vhost_sw_live_migration_start(struct vhost_dev *dev)
> > > +{
> > > +int idx;
> > > +
> > > +for (idx = 0; idx < dev->nvqs; ++idx) {
> > > +dev->sw_lm_shadow_vq[idx] = vhost_sw_lm_shadow_vq(dev, idx);
> > > +}
> > > +
> > > +vhost_dev_disable_notifiers(dev, dev->vdev);
> >
> > There is a race condition if the guest kicks the vq while this is
> > happening. The shadow vq hdev_notifier needs to be set so the vhost
> > device checks the virtqueue for requests that slipped in during the
> > race window.
> >
> 
> I'm not sure if I follow you. If I understand correctly,
> vhost_dev_disable_notifiers calls virtio_bus_cleanup_host_notifier,
> and the latter calls virtio_queue_host_notifier_read. That's why the
> documentation says "This might actually run the qemu handlers right
> away, so virtio in qemu must be completely setup when this is
> called.". Am I missing something?

There are at least two cases:

1. Virtqueue kicks that come before vhost_dev_disable_notifiers().
   vhost_dev_disable_notifiers() notices that and calls
   virtio_queue_notify_vq(). Will handle_sw_lm_vq() be invoked or is the
   device's vq handler function invoked?

2. Virtqueue kicks that come in after vhost_dev_disable_notifiers()
   returns. We hold the QEMU global mutex so the vCPU thread cannot
   perform MMIO/PIO dispatch immediately. The vCPU thread's
   ioctl(KVM_RUN) has already returned and will dispatch dispatch the
   MMIO/PIO access inside QEMU as soon as the global mutex is released.
   In other words, we're not taking the kvm.ko ioeventfd path but
   memory_region_dispatch_write_eventfds() should signal the ioeventfd
   that is registered at the time the dispatch occurs. Is that eventfd
   handled by handle_sw_lm_vq()?

Neither of these cases are obvious from the code. At least comments
would help but I suspect restructuring the code so the critical
ioeventfd state changes happen in a sequence would make it even clearer.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 13/27] vhost: Send buffers to device

2020-12-10 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 07:41:23PM +0100, Eugenio Perez Martin wrote:
> On Tue, Dec 8, 2020 at 9:16 AM Stefan Hajnoczi  wrote:
> > On Fri, Nov 20, 2020 at 07:50:51PM +0100, Eugenio Pérez wrote:
> > > +while (true) {
> > > +int r;
> > > +if (virtio_queue_full(vq)) {
> > > +break;
> > > +}
> >
> > Why is this check necessary? The guest cannot provide more descriptors
> > than there is ring space. If that happens somehow then it's a driver
> > error that is already reported in virtqueue_pop() below.
> >
> 
> It's just checked because virtqueue_pop prints an error on that case,
> and there is no way to tell the difference between a regular error and
> another caused by other causes. Maybe the right thing to do is just to
> not to print that error? Caller should do the error printing in that
> case. Should we return an error code?

The reason an error is printed today is because it's a guest error that
never happens with correct guest drivers. Something is broken and the
user should know about it.

Why is "virtio_queue_full" (I already forgot what that actually means,
it's not clear whether this is referring to avail elements or used
elements) a condition that should be silently ignored in shadow vqs?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 16/27] virtio: Expose virtqueue_alloc_element

2020-12-10 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 07:46:49PM +0100, Eugenio Perez Martin wrote:
> On Tue, Dec 8, 2020 at 9:26 AM Stefan Hajnoczi  wrote:
> >
> > On Fri, Nov 20, 2020 at 07:50:54PM +0100, Eugenio Pérez wrote:
> > > Specify VirtQueueElement * as return type makes no harm at this moment.
> >
> > The reason for the void * return type is that C implicitly converts void
> > pointers to pointers of any type. The function takes a size_t sz
> > argument so it can allocate a object of user-defined size. The idea is
> > that the user's struct embeds a VirtQueueElement field. Changing the
> > return type to VirtQueueElement * means that callers may need to
> > explicitly cast to the user's struct type.
> >
> > It's a question of coding style but I think the void * return type
> > communicates what is going on better than VirtQueueElement *.
> 
> Right, what I meant with that is that nobody uses that feature, but I
> just re-check and I saw that contrib/vhost-user-blk actually uses it
> (not checked for more uses). I think it is better just to drop this
> commit.

contrib/vhost-user-blk doesn't use hw/virtio/virtio.c. The code is
similar and copy-pasted, but you are free to change this file without
affecting vontrib/vhost-user-blk :).

I still think it's clearer to make it obvious that this function
allocates an object of generic type or at least the change is purely a
question of style and probably not worth making.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 13/20] drm/nouveau: Remove references to struct drm_device.pdev

2020-12-10 Thread Jeremy Cline
On Thu, Dec 10, 2020 at 08:56:08AM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.12.20 um 19:04 schrieb Jeremy Cline:
> > Hi,
> > 
> > On Tue, Dec 01, 2020 at 11:35:35AM +0100, Thomas Zimmermann wrote:
> > > Using struct drm_device.pdev is deprecated. Convert nouveau to struct
> > > drm_device.dev. No functional changes.
> > > 
> > > Signed-off-by: Thomas Zimmermann 
> > > Cc: Ben Skeggs 
> > > ---
> > >   drivers/gpu/drm/nouveau/dispnv04/arb.c  | 12 +++-
> > >   drivers/gpu/drm/nouveau/dispnv04/disp.h | 14 --
> > >   drivers/gpu/drm/nouveau/dispnv04/hw.c   | 10 ++
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.c |  7 ---
> > >   drivers/gpu/drm/nouveau/nouveau_acpi.c  |  2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_bios.c  | 11 ---
> > >   drivers/gpu/drm/nouveau/nouveau_connector.c | 10 ++
> > >   drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 ++---
> > >   drivers/gpu/drm/nouveau/nouveau_fbcon.c |  6 --
> > >   drivers/gpu/drm/nouveau/nouveau_vga.c   | 20 
> > >   10 files changed, 58 insertions(+), 39 deletions(-)
> > > 
> > 
> > I believe there's a use of drm_device.pdev in
> > drivers/gpu/drm/nouveau/dispnv04/dfp.c in the
> > nv04_dfp_update_backlight() function.
> 
> Oh, I see! It's PowerPC-only.
> 
> > 
> > Other than that, this looks good to me.
> 
> With the forgotten pdev fixes, can I count this as an Acked-by?
> 

Yeah, with that fix:

Reviewed-by: Jeremy Cline 

> Best regards
> Thomas
> 
> > 
> > > diff --git a/drivers/gpu/drm/nouveau/dispnv04/arb.c 
> > > b/drivers/gpu/drm/nouveau/dispnv04/arb.c
> > > index 9d4a2d97507e..1d3542d6006b 100644
> > > --- a/drivers/gpu/drm/nouveau/dispnv04/arb.c
> > > +++ b/drivers/gpu/drm/nouveau/dispnv04/arb.c
> > > @@ -200,16 +200,17 @@ nv04_update_arb(struct drm_device *dev, int VClk, 
> > > int bpp,
> > >   int MClk = nouveau_hw_get_clock(dev, PLL_MEMORY);
> > >   int NVClk = nouveau_hw_get_clock(dev, PLL_CORE);
> > >   uint32_t cfg1 = nvif_rd32(device, NV04_PFB_CFG1);
> > > + struct pci_dev *pdev = to_pci_dev(dev->dev);
> > >   sim_data.pclk_khz = VClk;
> > >   sim_data.mclk_khz = MClk;
> > >   sim_data.nvclk_khz = NVClk;
> > >   sim_data.bpp = bpp;
> > >   sim_data.two_heads = nv_two_heads(dev);
> > > - if ((dev->pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
> > > - (dev->pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
> > > + if ((pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
> > > + (pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
> > >   uint32_t type;
> > > - int domain = pci_domain_nr(dev->pdev->bus);
> > > + int domain = pci_domain_nr(pdev->bus);
> > >   
> > > pci_read_config_dword(pci_get_domain_bus_and_slot(domain, 0, 1),
> > > 0x7c, &type);
> > > @@ -251,11 +252,12 @@ void
> > >   nouveau_calc_arb(struct drm_device *dev, int vclk, int bpp, int *burst, 
> > > int *lwm)
> > >   {
> > >   struct nouveau_drm *drm = nouveau_drm(dev);
> > > + struct pci_dev *pdev = to_pci_dev(dev->dev);
> > >   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_KELVIN)
> > >   nv04_update_arb(dev, vclk, bpp, burst, lwm);
> > > - else if ((dev->pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
> > > -  (dev->pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
> > > + else if ((pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
> > > +  (pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
> > >   *burst = 128;
> > >   *lwm = 0x0480;
> > >   } else
> > > diff --git a/drivers/gpu/drm/nouveau/dispnv04/disp.h 
> > > b/drivers/gpu/drm/nouveau/dispnv04/disp.h
> > > index 5ace5e906949..f0a24126641a 100644
> > > --- a/drivers/gpu/drm/nouveau/dispnv04/disp.h
> > > +++ b/drivers/gpu/drm/nouveau/dispnv04/disp.h
> > > @@ -130,7 +130,7 @@ static inline bool
> > >   nv_two_heads(struct drm_device *dev)
> > >   {
> > >   struct nouveau_drm *drm = nouveau_drm(dev);
> > > - const int impl = dev->pdev->device & 0x0ff0;
> > > + const int impl = to_pci_dev(dev->dev)->device & 0x0ff0;
> > >   if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_CELSIUS 
> > > && impl != 0x0100 &&
> > >   impl != 0x0150 && impl != 0x01a0 && impl != 0x0200)
> > > @@ -142,14 +142,14 @@ nv_two_heads(struct drm_device *dev)
> > >   static inline bool
> > >   nv_gf4_disp_arch(struct drm_device *dev)
> > >   {
> > > - return nv_two_heads(dev) && (dev->pdev->device & 0x0ff0) != 0x0110;
> > > + return nv_two_heads(dev) && (to_pci_dev(dev->dev)->device & 0x0ff0) != 
> > > 0x0110;
> > >   }
> > >   static inline bool
> > >   nv_two_reg_pll(struct drm_device *dev)
> > >   {
> > >   struct nouveau_drm *drm = nouveau_drm(dev);
> > > - const int impl = dev->pdev->device & 0x0ff0;
> > > + const 

x86/ioapic: Cleanup the timer_works() irqflags mess

2020-12-10 Thread Thomas Gleixner
Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:

local_irq_save(flags);
local_irq_enable();
local_irq_restore(flags);

which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.

That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.

Reported-by: Mark Rutland 
Signed-off-by: Thomas Gleixner 
Tested-by: Mark Rutland 
---
 arch/x86/kernel/apic/io_apic.c |   22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
 static int __init timer_irq_works(void)
 {
unsigned long t1 = jiffies;
-   unsigned long flags;
 
if (no_timer_check)
return 1;
 
-   local_save_flags(flags);
local_irq_enable();
-
if (boot_cpu_has(X86_FEATURE_TSC))
delay_with_tsc();
else
delay_without_tsc();
 
-   local_irq_restore(flags);
-
/*
 * Expect a few ticks at least, to be sure some possible
 * glue logic does not lock up after one or two first
@@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
 * least one tick may be lost due to delays.
 */
 
-   /* jiffies wrap? */
-   if (time_after(jiffies, t1 + 4))
-   return 1;
-   return 0;
+   local_irq_disable();
+
+   /* Did jiffies advance? */
+   return time_after(jiffies, t1 + 4);
 }
 
 /*
@@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
struct irq_cfg *cfg = irqd_cfg(irq_data);
int node = cpu_to_node(0);
int apic1, pin1, apic2, pin2;
-   unsigned long flags;
int no_pin1 = 0;
 
if (!global_clock_event)
return;
 
-   local_irq_save(flags);
+   local_irq_disable();
 
/*
 * get/set the timer IRQ vector:
@@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
goto out;
}
panic_if_irq_remap("timer doesn't work through 
Interrupt-remapped IO-APIC");
-   local_irq_disable();
clear_IO_APIC_pin(apic1, pin1);
if (!no_pin1)
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
/*
 * Cleanup, just in case ...
 */
-   local_irq_disable();
legacy_pic->mask(0);
clear_IO_APIC_pin(apic2, pin2);
apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
@@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
legacy_pic->mask(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
@@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
if (apic_is_x2apic_enabled())
apic_printk(APIC_QUIET, KERN_INFO
@@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
"report.  Then try booting with the 'noapic' option.\n");
 out:
-   local_irq_restore(flags);
+   local_irq_enable();
 }
 
 /*
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails

2020-12-10 Thread Jason Wang


On 2020/12/9 下午9:27, wangyunjian wrote:

-Original Message-
From: Michael S. Tsirkin [mailto:m...@redhat.com]
Sent: Wednesday, December 9, 2020 8:50 PM
To: wangyunjian 
Cc: jasow...@redhat.com; virtualization@lists.linux-foundation.org;
net...@vger.kernel.org; Lilijun (Jerry) ;
chenchanghu ; xudingke 
Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails

On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:

From: Yunjian Wang 

Currently we break the loop and wake up the vhost_worker when sendmsg
fails. When the worker wakes up again, we'll meet the same error. This
will cause high CPU load. To fix this issue, we can skip this
description by ignoring the error.

Signed-off-by: Yunjian Wang 
---
  drivers/vhost/net.c | 24 +---
  1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
531a00d703cd..ac950b1120f5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net *net,
struct socket *sock)

/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(sock, &msg, len);
-   if (unlikely(err < 0)) {
-   vhost_discard_vq_desc(vq, 1);
-   vhost_net_enable_vq(net, vq);
-   break;
-   }
-   if (err != len)
-   pr_debug("Truncated TX packet: len %d != %zd\n",
-err, len);
+   if (unlikely(err < 0 || err != len))
+   vq_err(vq, "Fail to sending packets err : %d, len : 
%zd\n", err,
+len);
  done:
vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
vq->heads[nvq->done_idx].len = 0;

One of the reasons for sendmsg to fail is ENOBUFS.
In that case for sure we don't want to drop packet.

Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.



I think not, it can happen if we exceeds sndbuf. E.g see tun_alloc_skb().

Thanks





There could be other transient errors.
Which error did you encounter, specifically?

Currently a guest vm send a skb which length is more than 64k.
If virtio hdr is wrong, the problem will also be triggered.

Thanks


@@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct vhost_net
*net, struct socket *sock)

/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(sock, &msg, len);
-   if (unlikely(err < 0)) {
-   if (zcopy_used) {
+   if (unlikely(err < 0 || err != len)) {
+   if (zcopy_used && err < 0)
vhost_net_ubuf_put(ubufs);
-   nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
-   % UIO_MAXIOV;
-   }
-   vhost_discard_vq_desc(vq, 1);
-   vhost_net_enable_vq(net, vq);
-   break;
+   vq_err(vq, "Fail to sending packets err : %d, len : 
%zd\n", err,
+len);
}
-   if (err != len)
-   pr_debug("Truncated TX packet: "
-" len %d != %zd\n", err, len);
if (!zcopy_used)
vhost_add_used_and_signal(&net->dev, vq, head, 0);
else
--
2.23.0


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: x86/ioapic: Cleanup the timer_works() irqflags mess

2020-12-10 Thread Jürgen Groß via Virtualization

On 10.12.20 21:15, Thomas Gleixner wrote:

Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:

 local_irq_save(flags);
local_irq_enable();
 local_irq_restore(flags);

which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.

That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.

Reported-by: Mark Rutland 
Signed-off-by: Thomas Gleixner 
Tested-by: Mark Rutland 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization