[Xen-devel] xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online CPU

Anoob Soman Tue, 16 May 2017 10:17:16 -0700

Hi,

In our Xenserver testing, I have seen cases when we boot 50 windows VMstogether, dom0 kernel softlocks up.

The following is a brief description of the problem of what causedsoflockup detection code to kick. A HVM domain boot generates around200K (evtchn:qemu-dm xen-dyn) interrupts, in a short period of time. Allthese evtchn:qemu-dm are bound to VCPU 0, until irqbalance sees theseIRQ and moves it to a different VCPU. In case of XenServer, irqbalanceruns every 10 seconds, which means irqbalance doesn't get to see theseburst of interrupts and doesn't re-balance interrupts most of the time,making all evtchn:qemu-dm to be processed by VCPU0. This cause VCPU0 tospend most of time processing hardirq and very little time on softirq.Moreover, in XenServer dom0 PREEMPTION is disabled, this means VCPU0never runs watchdog (process context), triggering a softlockup detectioncode to panic.

One way to solve this problem would be to bind evtchn:qemu-dm to nextonline VCPU, will spread hardirq processing evenly across differentVCPU. Later, irqbalance will try to balance evtchn:qemu-dm, if required.

I am not sure if this is a sensible thing to do. I have a pseduo code(not the final patch) which tries to bind onlyIOCTL_EVTCHN_BIND_INTERDOMAIN (used by qemu-dm) to next.


 /* Rebind an evtchn so that it gets delivered to a specific cpu */
-static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
+int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu)
 {
     struct evtchn_bind_vcpu bind_vcpu;
-    int evtchn = evtchn_from_irq(irq);
     int masked;

     if (!VALID_EVTCHN(evtchn))

@@ -1339,6 +1338,12 @@ static int rebind_irq_to_cpu(unsigned irq,unsigned tcpu)


     return 0;
 }
+EXPORT_SYMBOL_GPL(xen_rebind_evtchn_to_cpu);
+
+static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
+{
+    return xen_rebind_evtchn_to_cpu(evtchn_from_irq(irq), tcpu);
+}

static int set_affinity_irq(struct irq_data *data, const structcpumask *dest,

                 bool force)
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 7efd1cb..bd34d3d 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -471,6 +471,7 @@ static long evtchn_ioctl(struct file *file,
     case IOCTL_EVTCHN_BIND_INTERDOMAIN: {
         struct ioctl_evtchn_bind_interdomain bind;
         struct evtchn_bind_interdomain bind_interdomain;
+        static unsigned int selected_cpu;

         rc = -EFAULT;
         if (copy_from_user(&bind, uarg, sizeof(bind)))
@@ -489,8 +490,13 @@ static long evtchn_ioctl(struct file *file,
             break;

         rc = evtchn_bind_to_user(u, bind_interdomain.local_port);
-        if (rc == 0)
+        if (rc == 0) {
             rc = bind_interdomain.local_port;
+            selected_cpu = cpumask_next(selected_cpu, cpu_online_mask);
+            if (selected_cpu >= nr_cpu_ids)
+                selected_cpu = cpumask_first(cpu_online_mask);
+            xen_rebind_evtchn_to_cpu(rc, selected_cpu);
+        }
         break;
     }

diff --git a/include/xen/events.h b/include/xen/events.h
index 88da2ab..f442ca5 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,6 +58,7 @@ void evtchn_put(unsigned int evtchn);

 void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector);
 void rebind_evtchn_irq(int evtchn, int irq);
+int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu);

 static inline void notify_remote_via_evtchn(int port)
 {

"selected_cpu" needs to be protected, but I would like to avoid taking alock. One way to avoid taking lock (before xen_rebind_evtchn_to_cpu())would be to use "local_port%num_present_cpus()" or " smp_processor_id()"as index into cpumask_next.


What do you guys suggest.

Thanks,

Anoob.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online CPU

Reply via email to