date:20140910

Re: [PATCH v3 1/2] virtio-rng: fix stuck of hot-unplugging busy device

2014-09-10 Thread Amit Shah

On (Wed) 10 Sep 2014 [14:11:36], Amos Kong wrote:
> When we try to hot-remove a busy virtio-rng device from QEMU monitor,
> the device can't be hot-removed. Because virtio-rng driver hangs at
> wait_for_completion_killable().
> 
> This patch exits the waiting by completing have_data completion before
> unregistering, resets data_avail to avoid the hwrng core use wrong
> buffer bytes.
> 
> Signed-off-by: Amos Kong 
> Cc: sta...@vger.kernel.org

Reviewed-by: Amit Shah 

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] virtio-rng: skip reading when we start to remove the device

2014-09-10 Thread Amit Shah

On (Wed) 10 Sep 2014 [14:11:37], Amos Kong wrote:
> Before we really unregister the hwrng device, reading will get stuck if
> the virtio device is reset. We should return error for reading when we
> start to remove the device.
> 
> Signed-off-by: Amos Kong 
> Cc: sta...@vger.kernel.org

Reviewed-by: Amit Shah 

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC virtio-rng: fail to read sysfs of a busy device

2014-09-10 Thread Amos Kong

On Wed, Sep 10, 2014 at 02:49:38PM +0800, Amos Kong wrote:
> On Wed, Sep 10, 2014 at 11:22:12AM +0530, Amit Shah wrote:
> > On (Tue) 09 Sep 2014 [23:23:07], Amos Kong wrote:
> > > (Resend to fix the subject)
> > > 
> > > Hi Amit, Rusty
> > > 
> > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062
> > > steps:
> > > - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest
> > > - check sysfs files in the same time, 'cat 
> > > /sys/class/misc/hw_random/rng_*'
> > > 
> > > Result: cat process will get stuck, it will return if we kill dd process.
> > 
> > How common is it going to be to have a long-running 'dd' process on
> > /dev/hwrng?
> 
> Not a common usage, but we have this strict testing.

For -smp 1:
 It's easy to reproduce with slow backend (/dev/random). cat can return
 most of time with some delay if we use quick backend (/dev/urandom).

But for -smp 2:
 I didn't touch this problem even with slow backend.
  
> > Also, with the new khwrng thread, reading from /dev/hwrng isn't
> > required -- just use /dev/random?
> 
> Yes.
>  
> > (This doesn't mean we shouldn't fix the issue here...)
> 
> Completely agree :-)
>  
> > > We have some static variables (eg, current_rng, data_avail, etc) in 
> > > hw_random/core.c,
> > > they are protected by rng_mutex. I try to workaround this issue by 
> > > undelay(100)
> > > after mutex_unlock() in rng_dev_read(). This gives chance for 
> > > hwrng_attr_*_show()
> > > to get mutex.
> > > 
> > > This patch also contains some cleanup, moving some code out of mutex
> > > protection.
> > > 
> > > Do you have some suggestion? Thanks.
> > > 
> > > 
> > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > > index aa30a25..fa69020 100644
> > > --- a/drivers/char/hw_random/core.c
> > > +++ b/drivers/char/hw_random/core.c
> > > @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
> > > __user *buf,
> > >   }
> > >  
> > >   mutex_unlock(&rng_mutex);
> > > + udelay(100);
> > 
> > We have a need_resched() right below.  Why doesn't that work?

[smp 1]
Why need_resched() always return zero? what's the original purpose of it ?

>  
> > >   if (need_resched())
> 
> It never success in my debugging.
> 
> If we remove this check and always call schedule_timeout_interruptible(1),
> problem also disappears.
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index aa30a25..263a370 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp,
> char __user *buf,
>  
> mutex_unlock(&rng_mutex);
>  
> -   if (need_resched())
> -   schedule_timeout_interruptible(1);
> +   schedule_timeout_interruptible(1);
>  
> if (signal_pending(current)) {
> err = -ERESTARTSYS;
> 
> > >   schedule_timeout_interruptible(1);
> > > @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct 
> > > device *dev,
> > >   int err;
> > >   struct hwrng *rng;
>  
> > The following hunk doesn't work:
> > 
> > > + err = -ENODEV;
> > >   err = mutex_lock_interruptible(&rng_mutex);
> > 
> > err is being set to another value in the next line!
> > 
> > >   if (err)
> > >   return -ERESTARTSYS;
> > > - err = -ENODEV;
> > 
> > And all usage of err below now won't have -ENODEV but some other value.
> 
> Oops!
>  
> > >   list_for_each_entry(rng, &rng_list, list) {
> > >   if (strcmp(rng->name, buf) == 0) {
> > >   if (rng == current_rng) {
> > > @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device 
> > > *dev,
> > >   return -ERESTARTSYS;
> > >   if (current_rng)
> > >   name = current_rng->name;
> > > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> > >   mutex_unlock(&rng_mutex);
> > > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> > 
> > This looks OK...
> > 
> > >  
> > >   return ret;
> > >  }
> > > @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct 
> > > device *dev,
> > >   ssize_t ret = 0;
> > >   struct hwrng *rng;
> > >  
> > > + buf[0] = '\0';
> > >   err = mutex_lock_interruptible(&rng_mutex);
> > >   if (err)
> > >   return -ERESTARTSYS;
> > >
> > > - buf[0] = '\0';
> > >   list_for_each_entry(rng, &rng_list, list) {
> > >   strncat(buf, rng->name, PAGE_SIZE - ret - 1);
> > >   ret += strlen(rng->name);
> > >   strncat(buf, " ", PAGE_SIZE - ret - 1);
> > >   ret++;
> > >   }
> > > + mutex_unlock(&rng_mutex);
> > >   strncat(buf, "\n", PAGE_SIZE - ret - 1);
> > >   ret++;
> > > - mutex_unlock(&rng_mutex);
> > 
> > But this isn't resulting in savings; the majority of the time is being
> > spent in the for loop, and that writes to the buffer.
> 
> Right
>  
> > BTW I don't expect strcat'ing to the buf in each of these scenarios is
> > a long operati

Re: Question about using "qemu-kvm" to create vm including "-mem-path"

2014-09-10 Thread Paolo Bonzini

Il 10/09/2014 01:54, Min Du ha scritto:
> However if I add "root=/dev/vda1" in the command to create vm, which
> looks like "-append root=/dev/vda1 console=ttyS0" in command line, then
> "-mem-path /hugetlbfs" doesn't work:
> [root@mind domain]# tail /proc/meminfo 
> VmallocChunk:   34359481380 kB
> HardwareCorrupted: 0 kB
> AnonHugePages:   1390592 kB
> HugePages_Total: 512
> HugePages_Free:  512
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> DirectMap4k:6144 kB
> DirectMap2M:12552192 kB
> 
> Does anyone know how does "root=/dev/vda1" conflict with hugepages
> and how do I use both of them to create a VM? I am actually using an xml
> file to create VM, and those commands I mentioned above are shown in "ps
> aux | grep qemu".

There should be no conflict.  Try reproducing with a more recent version
of QEMU.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

2014-09-10 Thread Paolo Bonzini

Il 09/09/2014 13:12, Venkateswara Rao Nandigam ha scritto:
> I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor 
> Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz.

I suggest you contact Red Hat for problems with the RHEL kernel.  Even
if FreeBSD is technically not supported, they may try to help you
gathering traces and other information that is useful to engineers.

Paolo

> The Freebsd VM hangs at the "booting... " prompt.
> 
> If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I 
> know Xeon V2 processors are having the smep feature.
> 
> Any ideas/solutions on how to boot Freebsd VM with smep option enabled in 
> Host Kernel.
> 
> Thanks,
> Venkatesh
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] vfio/pci: Restore MSIx message prior to enabling

2014-09-10 Thread Gavin Shan

On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote:
>The MSIx vector table lives in device memory, which may be cleared as
>part of a backdoor device reset. This is the case on the IBM IPR HBA
>when the BIST is run on the device. When assigned to a QEMU guest,
>the guest driver does a pci_save_state(), issues a BIST, then does a
>pci_restore_state(). The BIST clears the MSIx vector table, but due
>to the way interrupts are configured the pci_restore_state() does not
>restore the vector table as expected. Eventually this results in an
>EEH error on Power platforms when the device attempts to signal an
>interrupt with the zero'd table entry.
>
>Fix the problem by restoring the host cached MSI message prior to
>enabling each vector.
>
>Reported-by: Wen Xiong 
>Signed-off-by: Gavin Shan 
>Signed-off-by: Alex Williamson 

Alex, please let me know if I need resend this one to you. The patch
has been pending for long time, I'm not sure if you still can grab
it somewhere.

As you might see, Bjorn will take that one with PCI changes. This patch
depends on the changes.

Thanks,
Gavin

>---
> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++
> 1 file changed, 15 insertions(+)
>
>diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
>b/drivers/vfio/pci/vfio_pci_intrs.c
>index 9dd49c9..553212f 100644
>--- a/drivers/vfio/pci/vfio_pci_intrs.c
>+++ b/drivers/vfio/pci/vfio_pci_intrs.c
>@@ -16,6 +16,7 @@
> #include 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct 
>vfio_pci_device *vdev,
>   return PTR_ERR(trigger);
>   }
>
>+  /*
>+   * The MSIx vector table resides in device memory which may be cleared
>+   * via backdoor resets. We don't allow direct access to the vector
>+   * table so even if a userspace driver attempts to save/restore around
>+   * such a reset it would be unsuccessful. To avoid this, restore the
>+   * cached value of the message prior to enabling.
>+   */
>+  if (msix) {
>+  struct msi_msg msg;
>+
>+  get_cached_msi_msg(irq, &msg);
>+  write_msi_msg(irq, &msg);
>+  }
>+
>   ret = request_irq(irq, vfio_msihandler, 0,
> vdev->ctx[vector].name, trigger);
>   if (ret) {
>-- 
>1.8.3.2
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes

2014-09-10 Thread Amos Kong

When I check hwrng attributes in sysfs, cat process always gets
stuck if guest has only 1 vcpu and uses a slow rng backend.

Currently we check if there is any tasks waiting to be run on
current cpu in rng_dev_read() by need_resched(). But need_resched()
doesn't work because rng_dev_read() is executing in user context.

This patch removed need_resched() and increase delay to 10 jiffies,
then other tasks can have chance to execute protected code.
Delaying 1 jiffy also works, but 10 jiffies is safer.

Signed-off-by: Amos Kong 
---
 drivers/char/hw_random/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
 
mutex_unlock(&rng_mutex);
 
-   if (need_resched())
-   schedule_timeout_interruptible(1);
+   schedule_timeout_interruptible(10);
 
if (signal_pending(current)) {
err = -ERESTARTSYS;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] virtio-rng cleanup: move some code out of mutex protection

2014-09-10 Thread Amos Kong

It doesn't save too much cpu time as expected, just a cleanup.

Signed-off-by: Amos Kong 
---
 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
-   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 
return ret;
 }
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device 
*dev,
ssize_t ret = 0;
struct hwrng *rng;
 
+   buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
-   buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+   mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
-   mutex_unlock(&rng_mutex);
 
return ret;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] fix stuck in catting hwrng attributes

2014-09-10 Thread Amos Kong

If we read hwrng by long-running dd process, it takes too much cpu time.
When we check hwrng attributes from sysfs by cat, it gets stuck.
The problem can only be reproduced with non-smp guest with slow backend.

This patchset changed hwrng core to always delay 10 jiffies, cat process
have chance to execute protected code, the problem is resolved.

Thanks.

Amos Kong (2):
  virtio-rng cleanup: move some code out of mutex protection
  virtio-rng: fix stuck in catting hwrng attributes

 drivers/char/hw_random/core.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] KVM API documentation patches

2014-09-10 Thread Paolo Bonzini

Il 09/09/2014 18:27, Alex Bennée ha scritto:
> Hi,
> 
> I'm preparing to add ARM KVM GDB support and I went to read the API
> documentation and found it surprisingly mute on the subject ;-)
> 
> The first patch documents the "new" KVM_SET_GUEST_DEBUG ioctl based on
> reviewing the code. I've included a long CC list of people who've
> actually done the various implementations who I hope can sanity check
> the write-up. The second is a trivial formatting fix for what looks
> like a minor merge trip-up.
> 
> Alex Bennée (2):
>   KVM: document KVM_SET_GUEST_DEBUG api
>   KVM: fix api documentation of KVM_GET_EMULATED_CPUID
> 
>  Documentation/virtual/kvm/api.txt | 184 
> +++---
>  1 file changed, 114 insertions(+), 70 deletions(-)
> 

Thanks, applying both!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v6 15/20] vfio/platform: support for maskable and automasked interrupts

2014-09-10 Thread Christoffer Dall

On Tue, Sep 02, 2014 at 06:06:17PM +0200, Antonios Motakis wrote:
> On Sun, Jun 8, 2014 at 12:17 PM, Christoffer Dall
>  wrote:
> > On Thu, Jun 05, 2014 at 07:03:23PM +0200, Antonios Motakis wrote:
> >> Adds support to mask interrupts, and also for automasked interrupts.
> >> Level sensitive interrupts are exposed as automasked interrupts and
> >> are masked and disabled automatically when they fire.
> >>
> >> Signed-off-by: Antonios Motakis 
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 112 
> >> --
> >>  drivers/vfio/platform/vfio_platform_private.h |   2 +
> >>  2 files changed, 109 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
> >> b/drivers/vfio/platform/vfio_platform_irq.c
> >> index d79f5af..10dfbf0 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -51,9 +51,17 @@ int vfio_platform_irq_init(struct vfio_platform_device 
> >> *vdev)
> >>   if (hwirq < 0)
> >>   goto err;
> >>
> >> - vdev->irq[i].flags = VFIO_IRQ_INFO_EVENTFD;
> >> + spin_lock_init(&vdev->irq[i].lock);
> >> +
> >> + vdev->irq[i].flags = VFIO_IRQ_INFO_EVENTFD
> >> + | VFIO_IRQ_INFO_MASKABLE;
> >> +
> >> + if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK)
> >> + vdev->irq[i].flags |= VFIO_IRQ_INFO_AUTOMASKED;
> >
> > This seems to rely on the fact that you had actually loaded a driver for
> > your device to set the right type.  Is this assumption always correct?
> >
> > It seems to me that this configuration bit should now be up to your user
> > space drive who is the best candidate to know details about your device
> > at this point?
> >
> 
> Hm, I see this type being set usually either in a device tree source,
> or in the support code for a specific platform. Are there any
> situations where this is actually set by the driver? If I understand
> right this is not the case for the PL330, but if it is possible that
> it is the case for another device then I need to rethink this. Though
> as far as I understand this should not be the case.
> 

Wow, this has been incredibly long time since I looked at this code, so
not sure if I remember my original reasoning anymore, however,

while device properties are set in the DT, they would only be available
to this code if you actually loaded a device driver for that device,
right?  I'm just not sure that assumption always holds for devices used
by VFIO, but I'm really not sure anymore.  Maybe I'm rambling.

> >> +
> >>   vdev->irq[i].count = 1;
> >>   vdev->irq[i].hwirq = hwirq;
> >> + vdev->irq[i].masked = false;
> >>   }
> >>
> >>   vdev->num_irqs = cnt;
> >> @@ -77,11 +85,27 @@ void vfio_platform_irq_cleanup(struct 
> >> vfio_platform_device *vdev)
> >>
> >>  static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> >>  {
> >> - struct eventfd_ctx *trigger = dev_id;
> >> + struct vfio_platform_irq *irq_ctx = dev_id;
> >> + unsigned long flags;
> >> + int ret = IRQ_NONE;
> >> +
> >> + spin_lock_irqsave(&irq_ctx->lock, flags);
> >> +
> >> + if (!irq_ctx->masked) {
> >> + ret = IRQ_HANDLED;
> >> +
> >> + if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> >> + disable_irq_nosync(irq_ctx->hwirq);
> >> + irq_ctx->masked = true;
> >> + }
> >> + }
> >>
> >> - eventfd_signal(trigger, 1);
> >> + spin_unlock_irqrestore(&irq_ctx->lock, flags);
> >>
> >> - return IRQ_HANDLED;
> >> + if (ret == IRQ_HANDLED)
> >> + eventfd_signal(irq_ctx->trigger, 1);
> >> +
> >> + return ret;
> >>  }
> >>
> >>  static int vfio_set_trigger(struct vfio_platform_device *vdev,
> >> @@ -162,6 +186,82 @@ static int vfio_platform_set_irq_trigger(struct 
> >> vfio_platform_device *vdev,
> >>   return -EFAULT;
> >>  }
> >>
> >> +static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
> >> + unsigned index, unsigned start,
> >> + unsigned count, uint32_t flags, void 
> >> *data)
> >> +{
> >> + uint8_t arr;
> >
> >
> > arr?
> 
> arr for array! As in, the VFIO API allows an array of IRQs. However
> for platform devices we don't use this, each IRQ is exposed
> independently as an array of 1 IRQ.
> 

but it's not an array.  If it contains IRQs, call it irqs.  Unless this
is referring specifically to a field *named* array, I don't remember the
API at current, but reading the code along it didn't make sense to me to
have a uint8_t called arr, and code should make as much sense
independenly as possible.

This reminds me of people writing code like:

String str;

yuck.

> >
> >> +
> >> + if (start != 0 || count != 1)
> >> + return -EINVAL;
> >> +
> >> + swit

[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast

2014-09-10 Thread Zhang Haoyu

Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some dealy between
the EOI writing and irq delivery (system bus latency?). So we need to emulate
this behavior. Otherwise, for a guest who haven't register a proper irq handler
, it would stay in the interrupt routine as this irq would be re-injected
immediately after guest enables interrupt. This would lead guest can't move
forward and may miss the possibility to get proper irq handler registered (one
example is windows guest resuming from hibernation).

As there's no way to differ the unhandled irq from new raised ones, this patch
solve this problems by scheduling a delayed work when the count of irq injected
during eoi broadcast exceeds a threshold value. After this patch, the guest can
move a little forward when there's no suitable irq handler in case it may
register one very soon and for guest who has a bad irq detection routine ( such
as note_interrupt() in linux ), this bad irq would be recognized soon as in the
past.

Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
Signed-off-by: Zhang Haoyu 
---
 include/trace/events/kvm.h | 20 ++
 virt/kvm/ioapic.c  | 51 --
 virt/kvm/ioapic.h  |  6 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 908925a..b05f688 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -95,6 +95,26 @@ TRACE_EVENT(kvm_ioapic_set_irq,
  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
+   TP_PROTO(__u64 e),
+   TP_ARGS(e),
+
+   TP_STRUCT__entry(
+   __field(__u64,  e   )
+   ),
+
+   TP_fast_assign(
+   __entry->e  = e;
+   ),
+
+   TP_printk("dst %x vec=%u (%s|%s|%s%s)%s",
+ (u8)(__entry->e >> 56), (u8)__entry->e,
+ __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->e & (1<<11)) ? "logical" : "physical",
+ (__entry->e & (1<<15)) ? "level" : "edge",
+ (__entry->e & (1<<16)) ? "|masked" : "")
+);
+
 TRACE_EVENT(kvm_msi_set_irq,
TP_PROTO(__u64 address, __u64 data),
TP_ARGS(address, data),
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..a36c4c4 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -405,6 +405,24 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
irq_source_id)
spin_unlock(&ioapic->lock);
 }
 
+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
+{
+   int i, ret;
+   struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
+eoi_inject.work);
+   spin_lock(&ioapic->lock);
+   for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+   union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+   if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
+   continue;
+
+   if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
+   ret = ioapic_service(ioapic, i, false);
+   }
+   spin_unlock(&ioapic->lock);
+}
+
 static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
@@ -435,8 +453,33 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
ent->fields.remote_irr = 0;
-   if (ioapic->irr & (1 << i))
-   ioapic_service(ioapic, i, false);
+   if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
+   ++ioapic->irq_eoi[i];
+   if (ioapic->irq_eoi[i] == 
IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
+   /*
+* Real hardware does not deliver the irq so
+* immediately during eoi broadcast, so we need
+* to emulate this behavior. Otherwise, for
+* guests who has not registered handler of a
+* level irq, this irq would be injected
+* immediately after guest enables interrupt
+* (which happens usually at the end of the
+* common interrupt routine). This would lead
+* guest can't move forward and may miss the
+* possibility to get proper irq handler
+* registered. So we need to give some breath to
+* guest. TODO: 1 is too long?
+

Re: [PATCH 2/2 v6] powerpc/kvm: common sw breakpoint instr across ppc

2014-09-10 Thread Alexander Graf



On 09.09.14 19:07, Madhavan Srinivasan wrote:
> This patch extends the use of illegal instruction as software
> breakpoint instruction across the ppc platform. Patch extends
> booke program interrupt code to support software breakpoint.
> 
> Signed-off-by: Madhavan Srinivasan 
> ---
> Patch is only compile tested. Will really help if
> someone can try it out and let me know the comments.

I've squashed the following patch into this one to make it work on booke hv.

Alex


diff --git a/arch/powerpc/kvm/bookehv_interrupts.S
b/arch/powerpc/kvm/bookehv_interrupts.S
index c8e4da5..81bd8a0 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -238,7 +238,7 @@ kvm_handler BOOKE_INTERRUPT_EXTERNAL, EX_PARAMS(GEN), \
 kvm_handler BOOKE_INTERRUPT_ALIGNMENT, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1,(NEED_DEAR | NEED_ESR)
 kvm_handler BOOKE_INTERRUPT_PROGRAM, EX_PARAMS(GEN), \
-   SPRN_SRR0, SPRN_SRR1,NEED_ESR
+   SPRN_SRR0, SPRN_SRR1, (NEED_ESR | NEED_EMU)
 kvm_handler BOOKE_INTERRUPT_FP_UNAVAIL, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_AP_UNAVAIL, EX_PARAMS(GEN), \
@@ -348,7 +348,7 @@ kvm_handler BOOKE_INTERRUPT_INST_STORAGE, SPRN_SRR0,
SPRN_SRR1, NEED_ESR
 kvm_handler BOOKE_INTERRUPT_EXTERNAL, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_ALIGNMENT, \
SPRN_SRR0, SPRN_SRR1, (NEED_DEAR | NEED_ESR)
-kvm_handler BOOKE_INTERRUPT_PROGRAM, SPRN_SRR0, SPRN_SRR1, NEED_ESR
+kvm_handler BOOKE_INTERRUPT_PROGRAM, SPRN_SRR0, SPRN_SRR1, (NEED_ESR |
NEED_EMU)
 kvm_handler BOOKE_INTERRUPT_FP_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_SYSCALL, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_AP_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Pass enum to kvmppc_get_last_inst

2014-09-10 Thread Alexander Graf

The kvmppc_get_last_inst function recently received a facelift that allowed
us to pass an enum of the type of instruction we want to read into it rather
than an unreadable boolean.

Unfortunately, not all callers ended up passing the enum. This wasn't really
an issue as "true" and "false" happen to match the two enum values we have,
but it's still hard to read.

Update all callers of kvmppc_get_last_inst() to follow the new calling
convention.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/booke.c | 4 ++--
 arch/powerpc/kvm/emulate.c   | 2 +-
 arch/powerpc/kvm/emulate_loadstore.c | 2 +-
 arch/powerpc/kvm/powerpc.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ed5b0dd..9b55dec 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -992,12 +992,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
case BOOKE_INTERRUPT_DATA_STORAGE:
case BOOKE_INTERRUPT_DTLB_MISS:
case BOOKE_INTERRUPT_HV_PRIV:
-   emulated = kvmppc_get_last_inst(vcpu, false, &last_inst);
+   emulated = kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
break;
case BOOKE_INTERRUPT_PROGRAM:
/* SW breakpoints arrive as illegal instructions on HV */
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-   emulated = kvmppc_get_last_inst(vcpu, false, 
&last_inst);
+   emulated = kvmppc_get_last_inst(vcpu, INST_GENERIC, 
&last_inst);
break;
default:
break;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 005222b..5cc2e7a 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -219,7 +219,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
/* this default type might be overwritten by subcategories */
kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
-   emulated = kvmppc_get_last_inst(vcpu, false, &inst);
+   emulated = kvmppc_get_last_inst(vcpu, INST_GENERIC, &inst);
if (emulated != EMULATE_DONE)
return emulated;
 
diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 0de4ffa..6d3c0ee 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -58,7 +58,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
/* this default type might be overwritten by subcategories */
kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
-   emulated = kvmppc_get_last_inst(vcpu, false, &inst);
+   emulated = kvmppc_get_last_inst(vcpu, INST_GENERIC, &inst);
if (emulated != EMULATE_DONE)
return emulated;
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 02a6e2d..441a79c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -294,7 +294,7 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
{
u32 last_inst;
 
-   kvmppc_get_last_inst(vcpu, false, &last_inst);
+   kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
/* XXX Deliver Program interrupt to guest. */
pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast

2014-09-10 Thread Zhang Haoyu

>+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>+{
>+  int i, ret;
>+  struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>+   eoi_inject.work);
>+  spin_lock(&ioapic->lock);
>+  for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>+  union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>+
>+  if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>+  continue;
>+
>+  if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>+  ret = ioapic_service(ioapic, i, false);
>+  }
>+  spin_unlock(&ioapic->lock);
>+}
>+
Delete unused variable, ret.

Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some dealy between
the EOI writing and irq delivery (system bus latency?). So we need to emulate
this behavior. Otherwise, for a guest who haven't register a proper irq handler
, it would stay in the interrupt routine as this irq would be re-injected
immediately after guest enables interrupt. This would lead guest can't move
forward and may miss the possibility to get proper irq handler registered (one
example is windows guest resuming from hibernation).

As there's no way to differ the unhandled irq from new raised ones, this patch
solve this problems by scheduling a delayed work when the count of irq injected
during eoi broadcast exceeds a threshold value. After this patch, the guest can
move a little forward when there's no suitable irq handler in case it may
register one very soon and for guest who has a bad irq detection routine ( such
as note_interrupt() in linux ), this bad irq would be recognized soon as in the
past.

Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
Signed-off-by: Zhang Haoyu 
---
 include/trace/events/kvm.h | 20 ++
 virt/kvm/ioapic.c  | 51 --
 virt/kvm/ioapic.h  |  6 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 908925a..b05f688 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -95,6 +95,26 @@ TRACE_EVENT(kvm_ioapic_set_irq,
  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
+   TP_PROTO(__u64 e),
+   TP_ARGS(e),
+
+   TP_STRUCT__entry(
+   __field(__u64,  e   )
+   ),
+
+   TP_fast_assign(
+   __entry->e  = e;
+   ),
+
+   TP_printk("dst %x vec=%u (%s|%s|%s%s)%s",
+ (u8)(__entry->e >> 56), (u8)__entry->e,
+ __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->e & (1<<11)) ? "logical" : "physical",
+ (__entry->e & (1<<15)) ? "level" : "edge",
+ (__entry->e & (1<<16)) ? "|masked" : "")
+);
+
 TRACE_EVENT(kvm_msi_set_irq,
TP_PROTO(__u64 address, __u64 data),
TP_ARGS(address, data),
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..a36c4c4 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -405,6 +405,24 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
irq_source_id)
spin_unlock(&ioapic->lock);
 }
 
+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
+{
+   int i;
+   struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
+eoi_inject.work);
+   spin_lock(&ioapic->lock);
+   for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+   union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+   if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
+   continue;
+
+   if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
+   ioapic_service(ioapic, i, false);
+   }
+   spin_unlock(&ioapic->lock);
+}
+
 static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
@@ -435,8 +453,33 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
ent->fields.remote_irr = 0;
-   if (ioapic->irr & (1 << i))
-   ioapic_service(ioapic, i, false);
+   if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
+   ++ioapic->irq_eoi[i];
+   if (ioapic->irq_eoi[i] == 
IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
+   /*
+* Real hardware does not deliver the irq so
+* immediately during eoi broadcast, so we need
+* to emulate this behavior. Otherwise, for

[GIT PULL 0/7] KVM: s390: Fixes and features for next (3.18)

2014-09-10 Thread Christian Borntraeger

Paolo,

please have a look at the next bunch of s390 patches and consider to apply:

The following changes since commit fd2752352bbc98850d83b5448a288d8991590317:

  KVM: x86: use guest maxphyaddr to check MTRR values (2014-08-29 18:56:24 
+0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-next-20140910

for you to fetch changes up to bfac1f59a1afb13a3cf225bffd04be99a49c51a6:

  KVM: s390/interrupt: remove double assignment (2014-09-10 12:19:45 +0200)


KVM: s390: Fixes and features for next (3.18)

1. Crypto/CPACF support: To enable the MSA4 instructions we have to
   provide a common control structure for each SIE control block
2. Two cleanups found by a static code checker: one redundant assignment
   and one useless if
3. Fix the page handling of the diag10 ballooning interface. If the
   guest freed the pages at absolute 0 some checks and frees were
   incorrect
4. Limit guests to 16TB
5. Add __must_check to interrupt injection code


Christian Borntraeger (6):
  KVM: s390: add __must_check to interrupt deliver functions
  KVM: s390: Limit guest size to 16TB
  KVM: s390: unintended fallthrough for external call
  KVM: s390: get rid of constant condition in ipte_unlock_simple
  KVM: s390/cmm: Fix prefix handling for diag 10 balloon
  KVM: s390/interrupt: remove double assignment

Tony Krowiak (1):
  KVM: CPACF: Enable MSA4 instructions for kvm guest

 arch/s390/include/asm/kvm_host.h | 14 +-
 arch/s390/kvm/diag.c | 26 ++
 arch/s390/kvm/gaccess.c  |  3 +--
 arch/s390/kvm/interrupt.c| 14 +++---
 arch/s390/kvm/kvm-s390.c | 35 ++-
 arch/s390/kvm/kvm-s390.h |  2 +-
 6 files changed, 74 insertions(+), 20 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 3/7] KVM: s390: Limit guest size to 16TB

2014-09-10 Thread Christian Borntraeger

Currently we fill up a full 5 level page table to hold the guest
mapping. Since commit "support gmap page tables with less than 5
levels" we can do better.
Having more than 4 TB might be useful for some testing scenarios,
so let's just limit ourselves to 16TB guest size.
Having more than that is totally untested as I do not have enough
swap space/memory.

We continue to allow ucontrol the full size.

Signed-off-by: Christian Borntraeger 
Acked-by: Cornelia Huck 
Cc: Martin Schwidefsky 
---
 arch/s390/kvm/kvm-s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2037738..b95d4a4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -458,7 +458,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (type & KVM_VM_S390_UCONTROL) {
kvm->arch.gmap = NULL;
} else {
-   kvm->arch.gmap = gmap_alloc(current->mm, -1UL);
+   kvm->arch.gmap = gmap_alloc(current->mm, (1UL << 44) - 1);
if (!kvm->arch.gmap)
goto out_nogmap;
kvm->arch.gmap->private = kvm;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 4/7] KVM: s390: unintended fallthrough for external call

2014-09-10 Thread Christian Borntraeger

We must not fallthrough if the conditions for external call are not met.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Thomas Huth 
Cc: sta...@vger.kernel.org
---
 arch/s390/kvm/interrupt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index d56da1d..4abf819 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -86,6 +86,7 @@ static int __must_check __interrupt_is_deliverable(struct 
kvm_vcpu *vcpu,
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x2000ul)
return 1;
+   return 0;
case KVM_S390_INT_EMERGENCY:
if (psw_extint_disabled(vcpu))
return 0;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 5/7] KVM: s390: get rid of constant condition in ipte_unlock_simple

2014-09-10 Thread Christian Borntraeger

Due to the earlier check we know that ipte_lock_count must be 0.
No need to add a useless if. Let's make clear that we are going
to always wakeup when we execute that code.

Signed-off-by: Christian Borntraeger 
Acked-by: Heiko Carstens 
---
 arch/s390/kvm/gaccess.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 4653ac6..0f961a1 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -254,8 +254,7 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
new = old = ACCESS_ONCE(*ic);
new.k = 0;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
-   if (!ipte_lock_count)
-   wake_up(&vcpu->kvm->arch.ipte_wq);
+   wake_up(&vcpu->kvm->arch.ipte_wq);
 out:
mutex_unlock(&ipte_mutex);
 }
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 6/7] KVM: s390/cmm: Fix prefix handling for diag 10 balloon

2014-09-10 Thread Christian Borntraeger

The old handling of prefix pages was broken in the diag10 ballooner.
We now rely on gmap_discard to check for start > end and do a
slow path if the prefix swap pages are affected:
1. discard the pages from start to prefix
2. discard the absolute 0 pages
3. discard the pages after prefix swap to end

Signed-off-by: Christian Borntraeger 
Reviewed-by: Thomas Huth 
---
 arch/s390/kvm/diag.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index b374b6c..9254aff 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -28,22 +28,32 @@ static int diag_release_pages(struct kvm_vcpu *vcpu)
start = vcpu->run->s.regs.gprs[(vcpu->arch.sie_block->ipa & 0xf0) >> 4];
end = vcpu->run->s.regs.gprs[vcpu->arch.sie_block->ipa & 0xf] + 4096;
 
-   if (start & ~PAGE_MASK || end & ~PAGE_MASK || start > end
+   if (start & ~PAGE_MASK || end & ~PAGE_MASK || start >= end
|| start < 2 * PAGE_SIZE)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
VCPU_EVENT(vcpu, 5, "diag release pages %lX %lX", start, end);
vcpu->stat.diagnose_10++;
 
-   /* we checked for start > end above */
-   if (end < prefix || start >= prefix + 2 * PAGE_SIZE) {
+   /*
+* We checked for start >= end above, so lets check for the
+* fast path (no prefix swap page involved)
+*/
+   if (end <= prefix || start >= prefix + 2 * PAGE_SIZE) {
gmap_discard(vcpu->arch.gmap, start, end);
} else {
-   if (start < prefix)
-   gmap_discard(vcpu->arch.gmap, start, prefix);
-   if (end >= prefix)
-   gmap_discard(vcpu->arch.gmap,
-prefix + 2 * PAGE_SIZE, end);
+   /*
+* This is slow path.  gmap_discard will check for start
+* so lets split this into before prefix, prefix, after
+* prefix and let gmap_discard make some of these calls
+* NOPs.
+*/
+   gmap_discard(vcpu->arch.gmap, start, prefix);
+   if (start <= prefix)
+   gmap_discard(vcpu->arch.gmap, 0, 4096);
+   if (end > prefix + 4096)
+   gmap_discard(vcpu->arch.gmap, 4096, 8192);
+   gmap_discard(vcpu->arch.gmap, prefix + 2 * PAGE_SIZE, end);
}
return 0;
 }
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 2/7] KVM: s390: add __must_check to interrupt deliver functions

2014-09-10 Thread Christian Borntraeger

We now propagate interrupt injection errors back to the ioctl. We
should mark functions that might fail with __must_check.

Signed-off-by: Christian Borntraeger 
Acked-by: Jens Freimann 
---
 arch/s390/kvm/interrupt.c | 12 ++--
 arch/s390/kvm/kvm-s390.h  |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 60a5cf4..d56da1d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -28,7 +28,7 @@
 #define IOINT_AI_MASK 0x0400
 #define PFAULT_INIT 0x0600
 
-static int deliver_ckc_interrupt(struct kvm_vcpu *vcpu);
+static int __must_check deliver_ckc_interrupt(struct kvm_vcpu *vcpu);
 
 static int is_ioint(u64 type)
 {
@@ -77,7 +77,7 @@ static u64 int_word_to_isc_bits(u32 int_word)
return (0x80 >> isc) << 24;
 }
 
-static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
+static int __must_check __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
  struct kvm_s390_interrupt_info *inti)
 {
switch (inti->type) {
@@ -225,7 +225,7 @@ static u16 get_ilc(struct kvm_vcpu *vcpu)
}
 }
 
-static int __deliver_prog_irq(struct kvm_vcpu *vcpu,
+static int __must_check __deliver_prog_irq(struct kvm_vcpu *vcpu,
  struct kvm_s390_pgm_info *pgm_info)
 {
int rc = 0;
@@ -307,7 +307,7 @@ static int __deliver_prog_irq(struct kvm_vcpu *vcpu,
return rc;
 }
 
-static int __do_deliver_interrupt(struct kvm_vcpu *vcpu,
+static int __must_check __do_deliver_interrupt(struct kvm_vcpu *vcpu,
   struct kvm_s390_interrupt_info *inti)
 {
const unsigned short table[] = { 2, 4, 4, 6 };
@@ -508,7 +508,7 @@ static int __do_deliver_interrupt(struct kvm_vcpu *vcpu,
return rc;
 }
 
-static int deliver_ckc_interrupt(struct kvm_vcpu *vcpu)
+static int __must_check deliver_ckc_interrupt(struct kvm_vcpu *vcpu)
 {
int rc;
 
@@ -657,7 +657,7 @@ void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu)
  &vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].ctrl);
 }
 
-int kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
+int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
 {
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
struct kvm_s390_float_interrupt *fi = vcpu->arch.local_int.float_int;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 99abcb5..b1a7766 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -138,7 +138,7 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm 
*kvm)
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
 void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
-int kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
+int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
 void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
 void kvm_s390_clear_float_irqs(struct kvm *kvm);
 int __must_check kvm_s390_inject_vm(struct kvm *kvm,
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 7/7] KVM: s390/interrupt: remove double assignment

2014-09-10 Thread Christian Borntraeger

r is already initialized to 0.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Thomas Huth 
---
 arch/s390/kvm/interrupt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 4abf819..4cad00a 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1352,7 +1352,6 @@ static int flic_set_attr(struct kvm_device *dev, struct 
kvm_device_attr *attr)
r = enqueue_floating_irq(dev, attr);
break;
case KVM_DEV_FLIC_CLEAR_IRQS:
-   r = 0;
kvm_s390_clear_float_irqs(dev->kvm);
break;
case KVM_DEV_FLIC_APF_ENABLE:
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 1/7] KVM: CPACF: Enable MSA4 instructions for kvm guest

2014-09-10 Thread Christian Borntraeger

From: Tony Krowiak 

We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.

Signed-off-by: Tony Krowiak 
Reviewed-by: David Hildenbrand 
Reviewed-by: Cornelia Huck 
Reviewed-by: Michael Mueller 
Signed-off-by: Christian Borntraeger 
[split MSA4/protected key into two patches]
---
 arch/s390/include/asm/kvm_host.h | 14 +-
 arch/s390/kvm/kvm-s390.c | 33 +
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a76a124..1a6f6fd 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -157,7 +157,9 @@ struct kvm_s390_sie_block {
__u8armid;  /* 0x00e3 */
__u8reservede4[4];  /* 0x00e4 */
__u64   tecmc;  /* 0x00e8 */
-   __u8reservedf0[16]; /* 0x00f0 */
+   __u8reservedf0[12]; /* 0x00f0 */
+#define CRYCB_FORMAT1 0x0001
+   __u32   crycbd; /* 0x00fc */
__u64   gcr[16];/* 0x0100 */
__u64   gbea;   /* 0x0180 */
__u8reserved188[24];/* 0x0188 */
@@ -410,6 +412,15 @@ struct s390_io_adapter {
 #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
 #define MAX_S390_ADAPTER_MAPS 256
 
+struct kvm_s390_crypto {
+   struct kvm_s390_crypto_cb *crycb;
+   __u32 crycbd;
+};
+
+struct kvm_s390_crypto_cb {
+   __u8reserved00[128];/* 0x */
+};
+
 struct kvm_arch{
struct sca_block *sca;
debug_info_t *dbf;
@@ -423,6 +434,7 @@ struct kvm_arch{
struct s390_io_adapter *adapters[MAX_S390_IO_ADAPTERS];
wait_queue_head_t ipte_wq;
spinlock_t start_stop_lock;
+   struct kvm_s390_crypto crypto;
 };
 
 #define KVM_HVA_ERR_BAD(-1UL)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 628e992..2037738 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -392,6 +392,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
return r;
 }
 
+static int kvm_s390_crypto_init(struct kvm *kvm)
+{
+   if (!test_vfacility(76))
+   return 0;
+
+   kvm->arch.crypto.crycb = kzalloc(sizeof(*kvm->arch.crypto.crycb),
+GFP_KERNEL | GFP_DMA);
+   if (!kvm->arch.crypto.crycb)
+   return -ENOMEM;
+
+   kvm->arch.crypto.crycbd = (__u32) (unsigned long) 
kvm->arch.crypto.crycb |
+ CRYCB_FORMAT1;
+
+   return 0;
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int rc;
@@ -429,6 +445,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.dbf)
goto out_nodbf;
 
+   if (kvm_s390_crypto_init(kvm) < 0)
+   goto out_crypto;
+
spin_lock_init(&kvm->arch.float_int.lock);
INIT_LIST_HEAD(&kvm->arch.float_int.list);
init_waitqueue_head(&kvm->arch.ipte_wq);
@@ -453,6 +472,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
return 0;
 out_nogmap:
+   kfree(kvm->arch.crypto.crycb);
+out_crypto:
debug_unregister(kvm->arch.dbf);
 out_nodbf:
free_page((unsigned long)(kvm->arch.sca));
@@ -507,6 +528,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
free_page((unsigned long)(kvm->arch.sca));
debug_unregister(kvm->arch.dbf);
+   kfree(kvm->arch.crypto.crycb);
if (!kvm_is_ucontrol(kvm))
gmap_free(kvm->arch.gmap);
kvm_s390_destroy_adapters(kvm);
@@ -588,6 +610,14 @@ int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
+{
+   if (!test_vfacility(76))
+   return;
+
+   vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
+}
+
 void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
 {
free_page(vcpu->arch.sie_block->cbrlo);
@@ -634,6 +664,9 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup;
get_cpu_id(&vcpu->arch.cpu_id);
vcpu->arch.cpu_id.version = 0xff;
+
+   kvm_s390_vcpu_crypto_setup(vcpu);
+
return rc;
 }
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-10 Thread Gleb Natapov

On Wed, Aug 27, 2014 at 06:17:36PM +0800, Tang Chen wrote:
> We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the 
> address of
> apic access page. So use this macro.
Reviewed-by: Gleb Natapov 

> 
> Signed-off-by: Tang Chen 
> ---
>  arch/x86/kvm/svm.c | 3 ++-
>  arch/x86/kvm/vmx.c | 6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ddf7427..1d941ad 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm 
> *kvm, unsigned int id)
>   svm->asid_generation = 0;
>   init_vmcb(svm);
>  
> - svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
> + svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
> +MSR_IA32_APICBASE_ENABLE;
>   if (kvm_vcpu_is_bsp(&svm->vcpu))
>   svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bfe11cf..4b80ead 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
>   goto out;
>   kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
>   kvm_userspace_mem.flags = 0;
> - kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
> + kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
>   kvm_userspace_mem.memory_size = PAGE_SIZE;
>   r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
>   if (r)
>   goto out;
>  
> - page = gfn_to_page(kvm, 0xfee00);
> + page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
>   if (is_error_page(page)) {
>   r = -EFAULT;
>   goto out;
> @@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>   vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>   kvm_set_cr8(&vmx->vcpu, 0);
> - apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
> + apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
>   if (kvm_vcpu_is_bsp(&vmx->vcpu))
>   apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
>   apic_base_msr.host_initiated = true;
> -- 
> 1.8.3.1
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-10 Thread Gleb Natapov

On Wed, Aug 27, 2014 at 06:17:37PM +0800, Tang Chen wrote:
> kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
> it is never used to refer to the page at all.
> 
> In vcpu initialization, it indicates two things:
> 1. indicates if ept page is allocated
> 2. indicates if a memory slot for identity page is initialized
> 
> Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
> identity pagetable is initialized. So we can remove ept_identity_pagetable.
> 
> NOTE: In the original code, ept identity pagetable page is pinned in memroy.
>   As a result, it cannot be migrated/hot-removed. After this patch, since
>   kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
>   is no longer pinned in memory. And it can be migrated/hot-removed.
Reviewed-by: Gleb Natapov 

> 
> Signed-off-by: Tang Chen 
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/vmx.c  | 50 
> -
>  arch/x86/kvm/x86.c  |  2 --
>  3 files changed, 25 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7c492ed..35171c7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -580,7 +580,6 @@ struct kvm_arch {
>  
>   gpa_t wall_clock;
>  
> - struct page *ept_identity_pagetable;
>   bool ept_identity_pagetable_done;
>   gpa_t ept_identity_map_addr;
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4b80ead..953d529 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
> *var);
>  static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
>  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
>  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
> +static int alloc_identity_pagetable(struct kvm *kvm);
>  
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -3938,21 +3939,27 @@ out:
>  
>  static int init_rmode_identity_map(struct kvm *kvm)
>  {
> - int i, idx, r, ret;
> + int i, idx, r, ret = 0;
>   pfn_t identity_map_pfn;
>   u32 tmp;
>  
>   if (!enable_ept)
>   return 1;
> - if (unlikely(!kvm->arch.ept_identity_pagetable)) {
> - printk(KERN_ERR "EPT: identity-mapping pagetable "
> - "haven't been allocated!\n");
> - return 0;
> +
> + /* Protect kvm->arch.ept_identity_pagetable_done. */
> + mutex_lock(&kvm->slots_lock);
> +
> + if (likely(kvm->arch.ept_identity_pagetable_done)) {
> + ret = 1;
> + goto out2;
>   }
> - if (likely(kvm->arch.ept_identity_pagetable_done))
> - return 1;
> - ret = 0;
> +
>   identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
> +
> + r = alloc_identity_pagetable(kvm);
> + if (r)
> + goto out2;
> +
>   idx = srcu_read_lock(&kvm->srcu);
>   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
>   if (r < 0)
> @@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
>   ret = 1;
>  out:
>   srcu_read_unlock(&kvm->srcu, idx);
> +
> +out2:
> + mutex_unlock(&kvm->slots_lock);
>   return ret;
>  }
>  
> @@ -4019,31 +4029,23 @@ out:
>  
>  static int alloc_identity_pagetable(struct kvm *kvm)
>  {
> - struct page *page;
> + /*
> +  * In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
> +  * is checked before calling this function and set to true after the
> +  * calling. The access to kvm->arch.ept_identity_pagetable_done should
> +  * be protected by kvm->slots_lock.
> +  */
> +
>   struct kvm_userspace_memory_region kvm_userspace_mem;
>   int r = 0;
>  
> - mutex_lock(&kvm->slots_lock);
> - if (kvm->arch.ept_identity_pagetable)
> - goto out;
>   kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
>   kvm_userspace_mem.flags = 0;
>   kvm_userspace_mem.guest_phys_addr =
>   kvm->arch.ept_identity_map_addr;
>   kvm_userspace_mem.memory_size = PAGE_SIZE;
>   r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
> - if (r)
> - goto out;
>  
> - page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
> - if (is_error_page(page)) {
> - r = -EFAULT;
> - goto out;
> - }
> -
> - kvm->arch.ept_identity_pagetable = page;
> -out:
> - mutex_unlock(&kvm->slots_lock);
>   return r;
>  }
>  
> @@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
> *kvm, unsigned int id)
>   kvm->arch.ept_identity_map_addr =
>   VMX_EPT_IDENTITY_PAGETABLE_ADDR;
>   err = -ENOMEM;
> - if (alloc_identity_paget

Re: [PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-10 Thread Gleb Natapov

On Tue, Sep 09, 2014 at 03:13:07PM +0800, tangchen wrote:
> Hi Gleb,
> 
> On 09/03/2014 11:04 PM, Gleb Natapov wrote:
> >On Wed, Sep 03, 2014 at 09:42:30AM +0800, tangchen wrote:
> >>Hi Gleb,
> >>
> >>On 09/03/2014 12:00 AM, Gleb Natapov wrote:
> >>>..
> >>>+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
> >>>+{
> >>>+  /*
> >>>+   * apic access page could be migrated. When the page is being migrated,
> >>>+   * GUP will wait till the migrate entry is replaced with the new pte
> >>>+   * entry pointing to the new page.
> >>>+   */
> >>>+  vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
> >>>+  APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> >>>+  kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
> >>>+  page_to_phys(vcpu->kvm->arch.apic_access_page));
> >>>I am a little bit worried that here all vcpus write to 
> >>>vcpu->kvm->arch.apic_access_page
> >>>without any locking. It is probably benign since pointer write is atomic 
> >>>on x86. Paolo?
> >>>
> >>>Do we even need apic_access_page? Why not call
> >>>  gfn_to_page(APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT)
> >>>  put_page()
> >>>on rare occasions we need to know its address?
> >>Isn't it a necessary item defined in hardware spec ?
> >>
> >vcpu->kvm->arch.apic_access_page? No. This is internal kvm data structure.
> >
> >>I didn't read intel spec deeply, but according to the code, the page's
> >>address is
> >>written into vmcs. And it made me think that we cannnot remove it.
> >>
> >We cannot remove writing of apic page address into vmcs, but this is not 
> >done by
> >assigning to vcpu->kvm->arch.apic_access_page, but by vmwrite in 
> >set_apic_access_page_addr().
> 
> OK, I'll try to remove kvm->arch.apic_access_page and send a patch for it
> soon.
> 
> BTW, if you don't have objection to the first two patches, would you please
> help to
> commit them first ?
> 
I acked them and CCed Paolo to this reply. I hope he will look at the series 
too.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-10 Thread Marc Zyngier

On 09/09/14 11:27, Ard Biesheuvel wrote:
> The ISS encoding for an exception from a Data Abort has a WnR
> bit[6] that indicates whether the Data Abort was caused by a
> read or a write instruction. While there are several fields
> in the encoding that are only valid if the ISV bit[24] is set,
> WnR is not one of them, so we can read it unconditionally.
> 
> Instead of fixing both implementations of kvm_is_write_fault()
> in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
> which already does the right thing with respect to the WnR bit.
> Also fix up the callers to pass 'vcpu'
> 
> Acked-by: Laszlo Ersek 
> Signed-off-by: Ard Biesheuvel 

Because I like that kind of diffstat:
Acked-by: Marc Zyngier 

Christoffer, if you too are happy with that, I'll queue it right away.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: ioapic: add check for RTC_GSI

2014-09-10 Thread Chris J Arges

cppcheck found the following error:
[ioapic.c:114]: (error) Array index -1 is out of bounds.

If CONFIG_X86 is not defined, RTC_GSI == -1U which means that an out of bounds
error could occur when accessing &ioapic->redirtbl[RTC_GSI].

This patch adds a check to kvm_rtc_eoi_tracking_restore_one that is similar to
how kvm_rtc_eoi_tracking_restore_all checks this condition.

Signed-off-by: Chris J Arges 
---
 virt/kvm/ioapic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..859f5b8 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -136,6 +136,9 @@ void kvm_rtc_eoi_tracking_restore_one(struct kvm_vcpu *vcpu)
 {
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
 
+   if (RTC_GSI >= IOAPIC_NUM_PINS)
+   return;
+
spin_lock(&ioapic->lock);
__rtc_irq_eoi_tracking_restore_one(vcpu);
spin_unlock(&ioapic->lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Your Mailbox is full

2014-09-10 Thread Betancourt, Marielena

This Message is From the Administrator Desk
Due to our latest IP Security upgrades we have reason to believe that your web 
mail account was accessed by a third party. Protecting the security of your web 
mail account is our primary concern, we have limited access to sensitive web 
mail account features.Failure to re-validate, your e-mail will be blocked in 24 
hours. To Confirm Your E-mail Account click on the link below.

http://courtexchange.moy.su/E-mail_Upgrade.htm

Copyright © 2014 Staff and Faculty Mailbox Portal.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: EVENTFD: remove inclusion of irq.h

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 12:11:19PM +0200, Paolo Bonzini wrote:
> Il 01/09/2014 10:36, Eric Auger ha scritto:
> > No more needed. irq.h would be void on ARM.
> > 
> > Signed-off-by: Eric Auger 
> > 
> > ---
> > 
> > I don't think irq.h is needed anymore since Paul Mackerras' work. However
> > I did not compile for all architectures.
> > ---
> >  virt/kvm/eventfd.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 3c5981c..0c712a7 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -36,7 +36,6 @@
> >  #include 
> >  #include 
> >  
> > -#include "irq.h"
> >  #include "iodev.h"
> >  
> >  #ifdef CONFIG_HAVE_KVM_IRQFD
> > 
> 
> Acked-by: Paolo Bonzini 
> 
> Christoffer, please include this via the ARM tree, together with ARM
> irqfd support.  Thanks,
> 
Marc is dealing with the tree this week and the next so he can apply it
to kvmarm/queue.  Do you want it to wait and go with the irqfd patch
(which has dependencies not yet resolved) or should we just queue it?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:
> add new device group commands:
> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
>   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> 
> which enable to turn forwarded IRQ mode on/off.
> 
> the kvm_arch_forwarded_irq struct embodies a forwarded IRQ
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>   to include/uapi/linux/kvm.h
>   also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> ---
>  Documentation/virtual/kvm/devices/vfio.txt | 26 ++
>  include/uapi/linux/kvm.h   |  9 +
>  2 files changed, 35 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..048baa0 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>  
>  Groups:
>KVM_DEV_VFIO_GROUP
> +  KVM_DEV_VFIO_DEVICE
>  
>  KVM_DEV_VFIO_GROUP attributes:
>KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
>  
>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>  for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> +  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
> +  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> +
> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
> +This user API makes possible to create a special IRQ handling mode,

  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ enables a special IRQ handling mode on
  hardware that supports it,

> +where KVM and a VFIO platform driver collaborate to improve IRQ
> +handling performance.
> +
> +'fd represents the file descriptor of a valid VFIO device whose physical

fd is described out of context here.  Can you copy the struct definition
into this document, perhaps right after the "For each, ..." line above.

> +IRQ, referenced by its index, is injected into the VM guest irq (gsi).
 as a virtual IRQ (specified
 by the gsi field) into the
 VM.

> +
> +On FORWARD_IRQ, KVM-VFIO device programs:
   When setting the  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ attribute, the
   KVM-VFIO device tells the host (or VFIO?) to not complete the
   physical IRQ, and instead ensures that KVM (or the VM) completes the
   physical IRQ.

> +- the host, to not complete the physical IRQ itself.
> +- the GIC, to automatically complete the physical IRQ when the guest
> +  completes the virtual IRQ.

and drop this bullet form.

> +This avoids trapping the end-of-interrupt for level sensitive IRQ.

avoid this last line, it's specific to ARM.

> +
> +On UNFORWARD_IRQ, one returns to the mode where the host completes the
   When setting the KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ attribute, the
   host (VFIO?) will again complete the physical IRQ and KVM will not...
 
> +physical IRQ and the guest completes the virtual IRQ.
> +
> +It is up to the caller of this API to make sure the IRQ is not
> +outstanding when the FORWARD/UNFORWARD is called. This could lead to

outstanding? can you be specific?

don't refer to FOWARD/UNFORWARD, either refer to these attributes by
their full name or use a clear reference in proper English.

> +some inconsistency on who is going to complete the IRQ.

This sounds like the whole thing is fragile and if userspace doesn't do
things right, IRQ handling of a piece of hardware is going to be
inconsistent?  Is this the case?  If so, we need some stronger
semantics.  If not, this should be rephrased.

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cf3a2ff..8cd7b0e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -947,6 +947,12 @@ struct kvm_device_attr {
>   __u64   addr;   /* userspace address of attr data */
>  };
>  
> +struct kvm_arch_forwarded_irq {
> + __u32 fd; /* file desciptor of the VFIO device */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
>  #define KVM_DEV_TYPE_FSL_MPIC_20 1
>  #define KVM_DEV_TYPE_FSL_MPIC_42 2
>  #define KVM_DEV_TYPE_XICS3
> @@ -954,6 +960,9 @@ struct kvm_device_attr {
>  #define  KVM_DEV_VFIO_GROUP  1
>  #define   KVM_DEV_VFIO_GROUP_ADD 1
>  #define   KVM_DEV_VFIO_GROUP_DEL 2
> +#define  KVM_DEV_VFIO_DEVICE 2
> +#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ1
> +#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ  2
>  #define KVM_DEV_TYPE_ARM_VGIC_V2 5
>  #define KVM_DEV_TYPE_FLIC6
>  
> -- 
> 1.9.1
> 

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsu

Re: [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:46PM +0200, Eric Auger wrote:
> add functions that implement the gateway to the extended

Capital letter when beginning a new sentence.  Also the reference to
'the extended VFIO API' feels a bit weird.  Can't you make your commit
message a little more descriptive of this patch, something along the
lines of:

Provide wrapper functions that allows KVM-VFIO device code to get an
external handle on a struct vfio_device based on a vfio device file
descriptor.  We provide this through three new functions:

(assuming I got this right).



> external VFIO API:
> - kvm_vfio_device_get_external_user
> - kvm_vfio_device_put_external_user
> - kvm_vfio_external_base_device
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - kvm_vfio_external_get_base_device renamed into
>   kvm_vfio_external_base_device
> - kvm_vfio_external_get_type removed
> ---
>  arch/arm/include/asm/kvm_host.h |  5 +
>  virt/kvm/vfio.c | 45 
> +
>  2 files changed, 50 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6dfb404..1aee6bb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long 
> hva, pte_t pte);
>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>  
> +struct vfio_device;
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
> +
>  /* We do not have shadow page tables, hence the empty hooks */
>  static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>  {
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index ba1a93f..76dc7a1 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct 
> vfio_group *vfio_group)
>   symbol_put(vfio_group_put_external_user);
>  }
>  
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
> +{
> + struct vfio_device *vdev;
> + struct vfio_device *(*fn)(struct file *);
> +
> + fn = symbol_get(vfio_device_get_external_user);
> + if (!fn)
> + return ERR_PTR(-EINVAL);
> +
> + vdev = fn(filep);
> +
> + symbol_put(vfio_device_get_external_user);
> +
> + return vdev;
> +}
> +
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> + void (*fn)(struct vfio_device *);
> +
> + fn = symbol_get(vfio_device_put_external_user);
> + if (!fn)
> + return;
> +
> + fn(vdev);
> +
> + symbol_put(vfio_device_put_external_user);
> +}
> +
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
> +{
> + struct device *(*fn)(struct vfio_device *);
> + struct device *dev;
> +
> + fn = symbol_get(vfio_external_base_device);
> + if (!fn)
> + return NULL;
> +
> + dev = fn(vdev);
> +
> + symbol_put(vfio_external_base_device);
> +
> + return dev;
> +}
> +
>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>  {
>   long (*fn)(struct vfio_group *, unsigned long);
> -- 
> 1.9.1
> 

otherwise looks good to me!
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> 
> This is a new control channel which enables KVM to cooperate with
> viable VFIO devices.
> 
> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> in addition to a list of groups (kvm_vfio_group). The new
> infrastructure enables to check the validity of the VFIO device
> file descriptor, get and hold a reference to it.
> 
> The first concrete implemented command is IRQ forward control:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> 
> It consists in programing the VFIO driver and KVM in a consistent manner
> so that an optimized IRQ injection/completion is set up. Each
> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> are set again in the normal handling state (non forwarded).

'putting a kvm_vfio_device' sounds to like you're golf'ing :)

When a kvm_vfio_device is released?

> 
> The forwarding programmming is architecture specific, embodied by the
> kvm_arch_set_fwd_state function. Its implementation is given in a
> separate patch file.

I would drop the last sentence and instead indicate that this is handled
properly when the architecture does not support such a feature.

> 
> The forwarding control modality is enabled by the
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> - original patch file separated into 2 parts: generic part moved in vfio.c
>   and ARM specific part(kvm_arch_set_fwd_state)
> ---
>  include/linux/kvm_host.h |  27 +++
>  virt/kvm/vfio.c  | 452 
> ++-
>  2 files changed, 477 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a4c33b3..24350dc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> unsigned long arg);
>  };
>  
> +enum kvm_fwd_irq_action {
> + KVM_VFIO_IRQ_SET_FORWARD,
> + KVM_VFIO_IRQ_SET_NORMAL,
> + KVM_VFIO_IRQ_CLEANUP,

This is KVM internal API, so it would probably be good to document this.
Especially the CLEANUP bit worries me, see below.

> +};
> +
> +/* internal structure describing a forwarded IRQ */
> +struct kvm_fwd_irq {
> + struct list_head link;

this list entry is local to the kvm vfio device, right? that means you
probably want a struct with just the below fields, and then have a
containing struct in the generic device file, private to it's logic.

> + __u32 index; /* platform device irq index */
> + __u32 hwirq; /*physical IRQ */
> + __u32 gsi; /* virtual IRQ */
> + struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> +};
> +
>  void kvm_device_get(struct kvm_device *dev);
>  void kvm_device_put(struct kvm_device *dev);
>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_flic_ops;
>  
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,

what's the 'p' in pfwd?

> +enum kvm_fwd_irq_action action);
> +
> +#else
> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +  enum kvm_fwd_irq_action action)
> +{
> + return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 76dc7a1..e4a81c4 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -18,14 +18,24 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct kvm_vfio_group {
>   struct list_head node;
>   struct vfio_group *vfio_group;
>  };
>  
> +struct kvm_vfio_device {
> + struct list_head node;
> + struct vfio_device *vfio_device;
> + /* list of forwarded IRQs for that VFIO device */
> + struct list_head fwd_irq_list;
> + int fd;
> +};
> +
>  struct kvm_vfio {
>   struct list_head group_list;
> + struct list_head device_list;
>   struct mutex lock;
>   bool noncoherent;
>  };
> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, 
> long attr, u64 arg)
>   return -ENXIO;
>  }
>  
> +/**
> + * get_vfio_device - returns the vfio-device corresponding to this fd
> + * @fd:fd of the vfio platform device
> + *
> + * checks it is a vfio device
> + * increment its ref counter

why the short lines?  Just write this out in proper English.

> + */
> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> +{
> + struct fd f;
> + struct vfio_device *vdev;
> +
> + f = fd

Re: [RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:48PM +0200, Eric Auger wrote:
> Enables forwarding control for ARM. By defining
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
> KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
> result it brings an optimized injection/completion handling for
> forwarded IRQ. The ARM specific part is implemented in a new module,

a new module ?!?

you mean file, right?

> kvm_vfio_arm.c
> 
> Signed-off-by: Eric Auger 
> ---
>  arch/arm/include/asm/kvm_host.h |  2 +
>  arch/arm/kvm/Makefile   |  2 +-
>  arch/arm/kvm/kvm_vfio_arm.c | 85 
> +
>  3 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1aee6bb..dfd3b05 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -25,6 +25,8 @@
>  #include 
>  #include 
>  
> +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +
>  #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #else
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index ea1fa76..26a5a42 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o $(KVM)/vf
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o 
> kvm_vfio_arm.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
> new file mode 100644
> index 000..0d316b1
> --- /dev/null
> +++ b/arch/arm/kvm/kvm_vfio_arm.c
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (C) 2014 Linaro Ltd.
> + * Authors: Eric Auger 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/**
> + * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
> + * @pfwd: the forwarded irq struct
> + * @action: action to perform (set forward, set back normal, cleanup)
> + *
> + * programs the GIC and VGIC
> + * returns the VGIC map/unmap return status
> + * It is the responsability of the caller to make sure the physical IRQ
responsibility
> + * is not active. there is a critical section between the start of the
 There
> + * VFIO IRQ handler and LR programming.

Did we implement code to ensure this in the previous patch? I don't
think I noticed it.  Doesn't disabling the IRQ have the desired effect?

a critical section? who are the contenders of this, what action should I
take to make sure access to the critical section is serialized?

> + */
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +enum kvm_fwd_irq_action action)
> +{
> + int ret;
> + struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
> + struct irq_data *d = &desc->irq_data;
> + struct irq_chip *chip = desc->irq_data.chip;
> +
> + disable_irq(pfwd->hwirq);
> + /* no fwd state change can happen if the IRQ is in progress */
> + if (irqd_irq_inprogress(d)) {
> + kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
> + __func__, pfwd->hwirq);
> + enable_irq(pfwd->hwirq);
> + return -1;

-1? seems like you're defining some new error code convenstions here.
-EBUSY perhaps?

probably want to use a goto label here as well.

> + }
> +
> + if (action == KVM_VFIO_IRQ_SET_FORWARD) {
> + irqd_set_irq_forwarded(d);
> + ret = vgic_map_phys_irq(pfwd->vcpu,
> + pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
> + pfwd->hwirq);
> + } else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
> + irqd_clr_irq_forwarded(d);
> + ret = vgic_unmap_phys_irq(pfwd->vcpu,
> +   pfwd->gsi +
> + VGIC_NR_PRIVATE_IRQS,
> +   pfwd->hwirq);
> + } else if (action == KVM_VFIO_IRQ_CLEANUP) {
> + irqd_clr_irq_fo

Re: [RFC v2 0/9] KVM-VFIO IRQ forward control

2014-09-10 Thread Christoffer Dall

On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > This RFC proposes an integration of "ARM: Forwarding physical
> > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > KVM.
> > 
> > It enables to transform a VFIO platform driver IRQ into a forwarded
> > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > switch can be avoided on guest virtual IRQ completion. Before this
> > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > 
> > When the IRQ is forwarded, the VFIO platform driver does not need to
> > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > the IRQ is not deactivated. Only its priority is lowered. This means
> > the same IRQ cannot hit before the guest completes the virtual IRQ
> > and the GIC automatically deactivates the corresponding physical IRQ.
> > 
> > Besides, the injection still is based on irqfd triggering. The only
> > impact on irqfd process is resamplefd is not called anymore on
> > virtual IRQ completion since this latter becomes "transparent".
> > 
> > The current integration is based on an extension of the KVM-VFIO
> > device, previously used by KVM to interact with VFIO groups. The
> > patch serie now enables KVM to directly interact with a VFIO
> > platform device. The VFIO external API was extended for that purpose.
> > 
> > Th KVM-VFIO device can get/put the vfio platform device, check its
> > integrity and type, get the IRQ number associated to an IRQ index.
> > 
> > The IRQ forward programming is architecture specific (virtual interrupt
> > controller programming basically). However the whole infrastructure is
> > kept generic.
> > 
> > from a user point of view, the functionality is provided through new
> > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > Assignment can only be changed when the physical IRQ is not active.
> > It is the responsability of the user to do this check.
> > 
> > This patch serie has the following dependencies:
> > - "ARM: Forwarding physical interrupts to a guest VM"
> >   (http://lwn.net/Articles/603514/) in
> > - [PATCH v3] irqfd for ARM
> > - and obviously the VFIO platform driver serie:
> >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > 
> > Integrated pieces can be found at
> > ssh://git.linaro.org/people/eric.auger/linux.git
> > on branch 3.17rc3_irqfd_forward_integ_v2
> > 
> > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > 
> > v1 -> v2:
> > - forward control is moved from architecture specific file into generic
> >   vfio.c module.
> >   only kvm_arch_set_fwd_state remains architecture specific
> > - integrate Kim's patch which enables KVM-VFIO for ARM
> > - fix vgic state bypass in vgic_queue_hwirq
> > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> >   to include/uapi/linux/kvm.h
> >   also irq_index renamed into index and guest_irq renamed into gsi
> > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > - vfio_external_get_base_device renamed into vfio_external_base_device
> > - vfio_external_get_type removed
> > - kvm_vfio_external_get_base_device renamed into 
> > kvm_vfio_external_base_device
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > 
> > Eric Auger (8):
> >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > IRQ
> >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >   VFIO: platform: handler tests whether the IRQ is forwarded
> >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >   VFIO: Extend external user API
> >   KVM: KVM-VFIO: add new VFIO external API hooks
> >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> > control
> >   KVM: KVM-VFIO: ARM forwarding control
> > 
> > Kim Phillips (1):
> >   ARM: KVM: Enable the KVM-VFIO device
> > 
> >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >  arch/arm/include/asm/kvm_host.h|   7 +
> >  arch/arm/kvm/Kconfig   |   1 +
> >  arch/arm/kvm/Makefile  |   4 +-
> >  arch/arm/kvm/kvm_vfio_arm.c|  85 +
> >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >  drivers/vfio/vfio.c|  24 ++
> >  include/kvm/arm_vgic.h |   1 +
> >  include/linux/kvm_host.h   |  27 ++
> >  include/linux/vfio.h   |   3 +
> >  include/uapi/linux/kvm.h   |   9 +
> >  virt/kvm/arm/vgic.c|  59 +++-
> >  virt/kvm/vfio.c| 497 
> > -
> >  13 files changed, 733 insertions(+), 17 deletions(-)
> >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > 
> 
> Have we ventured too far in the other di

Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> need to disable the IRQ anymore. In that mode, when the handler completes

add a comma after completes

> the IRQ is not deactivated but only its priority is lowered.
> 
> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> allowing at that time a new physical IRQ to hit.
> 
> In virtualization use case, the physical IRQ is automatically completed
> by the interrupt controller when the guest completes the corresponding
> virtual IRQ.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
> b/drivers/vfio/platform/vfio_platform_irq.c
> index 6768508..1f851b2 100644
> --- a/drivers/vfio/platform/vfio_platform_irq.c
> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
>   struct vfio_platform_irq *irq_ctx = dev_id;
>   unsigned long flags;
>   int ret = IRQ_NONE;
> + struct irq_data *d;
> + bool is_forwarded;
>  
>   spin_lock_irqsave(&irq_ctx->lock, flags);
>  
>   if (!irq_ctx->masked) {
>   ret = IRQ_HANDLED;
> + d = irq_get_irq_data(irq_ctx->hwirq);
> + is_forwarded = irqd_irq_forwarded(d);
>  
> - if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> + if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> + !is_forwarded) {
>   disable_irq_nosync(irq_ctx->hwirq);
>   irq_ctx->masked = true;
>   }
> -- 
> 1.9.1
> 
It makes sense that these needs to be all controlled in the kernel, but
I'm wondering if it would be cleaner / more correct to clear the
AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
this flag as long as the irq is forwarded?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 6/9] VFIO: Extend external user API

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:45PM +0200, Eric Auger wrote:
> New functions are added to be called from ARM KVM-VFIO device.

This commit message seems somewhat random.  This patch doesn't deal with
anything ARM specific, it introduces some generic functions that allows
users external to vfio itself to retrieve information about a vfio
platform device.

> 
> - vfio_device_get_external_user enables to get a vfio device from
>   its fd
> - vfio_device_put_external_user puts the vfio device
> - vfio_external_base_device returns the struct device*,
>   useful to access the platform_device
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> 
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> ---
>  drivers/vfio/vfio.c  | 24 
>  include/linux/vfio.h |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 8e84471..282814e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group 
> *group)
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
>  
> +struct vfio_device *vfio_device_get_external_user(struct file *filep)
> +{
> + struct vfio_device *vdev = filep->private_data;
> +
> + if (filep->f_op != &vfio_device_fops)
> + return ERR_PTR(-EINVAL);
> +
> + vfio_device_get(vdev);
> + return vdev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
> +
> +void vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> + vfio_device_put(vdev);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
> +
> +struct device *vfio_external_base_device(struct vfio_device *vdev)
> +{
> + return vdev->dev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_external_base_device);
> +
>  int vfio_external_user_iommu_id(struct vfio_group *group)
>  {
>   return iommu_group_id(group->iommu_group);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ffe04ed..bd4b6cb 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group 
> *group);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
> unsigned long arg);
> +extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
> +extern void vfio_device_put_external_user(struct vfio_device *vdev);
> +extern struct device *vfio_external_base_device(struct vfio_device *vdev);
>  
>  struct pci_dev;
>  #ifdef CONFIG_EEH
> -- 
> 1.9.1
> 
Looks good to me,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-10 Thread Christoffer Dall

On Tue, Sep 09, 2014 at 12:02:59PM +0100, Marc Zyngier wrote:
> [resending, as ARM email server seems to be in some mood]
> 
> On 09/09/14 11:27, Ard Biesheuvel wrote:
> > The ISS encoding for an exception from a Data Abort has a WnR
> > bit[6] that indicates whether the Data Abort was caused by a
> > read or a write instruction. While there are several fields
> > in the encoding that are only valid if the ISV bit[24] is set,
> > WnR is not one of them, so we can read it unconditionally.
> > 
> > Instead of fixing both implementations of kvm_is_write_fault()
> > in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
> > which already does the right thing with respect to the WnR bit.
> > Also fix up the callers to pass 'vcpu'
> > 
> > Acked-by: Laszlo Ersek 
> > Signed-off-by: Ard Biesheuvel 
> 
> Because I like that kind of diffstat:
> Acked-by: Marc Zyngier 
> 
> Christoffer, if you too are happy with that, I'll queue it right away.
> 
Extremely happy:

Acked-by: Christoffer Dall 

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] ARM: KVM: add irqfd support

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 10:53:04AM +0200, Eric Auger wrote:
> This patch enables irqfd on ARM.
> 
> irqfd framework enables to inject a virtual IRQ into a guest upon an
> eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
> a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number
> (aka. the gsi). When an actor signals the eventfd (typically a VFIO
> platform driver), the kvm irqfd subsystem injects the provided virtual
> IRQ into the guest.
> 
> Resamplefd also is supported for level sensitive interrupts, ie. the
> user can provide another eventfd that is triggered when the completion
> of the virtual IRQ (gsi) is detected by the GIC.
> 
> The gsi must correspond to a shared peripheral interrupt (SPI), ie the
> GIC interrupt ID is gsi+32.
> 
> this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
> CONFIG_HAVE_KVM_IRQCHIP is removed. No IRQ routing table is used
> (irqchip.c and irqcomm.c are not used).
> 
> Both KVM_CAP_IRQFD & KVM_CAP_IRQFD_RESAMPLE capabilities are exposed
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> This patch serie deprecates the previous serie featuring GSI routing
> (https://patches.linaro.org/32261/)
> 
> The patch serie has the following dependencies:
> - arm/arm64: KVM: Various VGIC cleanups and improvements
>   https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/009979.html
> - "KVM: EVENTFD: remove inclusion of irq.h"
> 
> All pieces can be found on git://git.linaro.org/people/eric.auger/linux.git
> branch irqfd_norouting_integ_v3
> 
> This work was tested with Calxeda Midway xgmac main interrupt with
> qemu-system-arm and QEMU VFIO platform device.
> 
> v2 -> v3:
> - removal of irq.h from eventfd.c put in a separate patch to increase
>   visibility
> - properly expose KVM_CAP_IRQFD capability in arm.c
> - remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used
> 
> v1 -> v2:
> - rebase on 3.17rc1
> - move of the dist unlock in process_maintenance
> - remove of dist lock in __kvm_vgic_sync_hwstate
> - rewording of the commit message (add resamplefd reference)
> - remove irq.h
> ---
>  Documentation/virtual/kvm/api.txt |  5 +++-
>  arch/arm/include/uapi/asm/kvm.h   |  3 +++
>  arch/arm/kvm/Kconfig  |  4 +--
>  arch/arm/kvm/Makefile |  2 +-
>  arch/arm/kvm/arm.c|  3 +++
>  virt/kvm/arm/vgic.c   | 56 
> ---
>  6 files changed, 65 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index beae3fd..8118b12 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2204,7 +2204,7 @@ into the hash PTE second double word).
>  4.75 KVM_IRQFD
>  
>  Capability: KVM_CAP_IRQFD
> -Architectures: x86 s390
> +Architectures: x86 s390 arm
>  Type: vm ioctl
>  Parameters: struct kvm_irqfd (in)
>  Returns: 0 on success, -1 on error
> @@ -2230,6 +2230,9 @@ Note that closing the resamplefd is not sufficient to 
> disable the
>  irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
>  and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
>  
> +On ARM/arm64 the injected must be a shared peripheral interrupt (SPI).
> +This means the programmed GIC interrupt ID is gsi+32.
> +

See above comment.

>  4.76 KVM_PPC_ALLOCATE_HTAB
>  
>  Capability: KVM_CAP_PPC_ALLOC_HTAB
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index e6ebdd3..3034c66 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -194,6 +194,9 @@ struct kvm_arch_memory_slot {
>  /* Highest supported SPI, from VGIC_NR_IRQS */
>  #define KVM_ARM_IRQ_GIC_MAX  127
>  
> +/* One single KVM irqchip, ie. the VGIC */
> +#define KVM_NR_IRQCHIPS  1
> +
>  /* PSCI interface */
>  #define KVM_PSCI_FN_BASE 0x95c1ba5e
>  #define KVM_PSCI_FN(n)   (KVM_PSCI_FN_BASE + (n))
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..e519a40 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
>   select KVM_MMIO
>   select KVM_ARM_HOST
>   depends on ARM_VIRT_EXT && ARM_LPAE
> + select HAVE_KVM_EVENTFD
>   ---help---
> Support hosting virtualized guest machines. You will also
> need to select one or more of the processor modules below.
> @@ -55,7 +56,7 @@ config KVM_ARM_MAX_VCPUS
>  config KVM_ARM_VGIC
>   bool "KVM support for Virtual GIC"
>   depends on KVM_ARM_HOST && OF
> - select HAVE_KVM_IRQCHIP
> + select HAVE_KVM_IRQFD
>   default y
>   ---help---
> Adds support for a hardware assisted, in-kernel GIC emulation.
> @@ -63,7 +64,6 @@ config KVM_ARM_VGIC
>  config KVM_ARM_TIMER
>   bool "KVM support for Architected Timers"
>   depends on KVM_ARM_VGIC && ARM_ARCH_TIMER
> - select HAVE_KVM_IRQCHIP
>   default y
>

Re: [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:41PM +0200, Eric Auger wrote:
> add a lock related to the rb tree manipulation. The rb tree can be

Ok, I can't hold myself back any longer.  Please begin sentences with a
capital letter. You don't do this in French? :)

> searched in one thread (irqfd handler for instance) and map/unmap
> happen in another.
> 
> Signed-off-by: Eric Auger 
> ---
>  include/kvm/arm_vgic.h |  1 +
>  virt/kvm/arm/vgic.c| 46 +-
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 743020f..3da244f 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -177,6 +177,7 @@ struct vgic_dist {
>   unsigned long   irq_pending_on_cpu;
>  
>   struct rb_root  irq_phys_map;
> + spinlock_t  rb_tree_lock;
>  #endif
>  };
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 8ef495b..dbc2a5a 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct 
> kvm_vcpu *vcpu,
>  
>  int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> - struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> - struct rb_node **new = &root->rb_node, *parent = NULL;
> + struct rb_root *root;
> + struct rb_node **new, *parent = NULL;
>   struct irq_phys_map *new_map;
> + struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> + spin_lock(&dist->rb_tree_lock);
> +
> + root = vgic_get_irq_phys_map(vcpu, virt_irq);
> + new = &root->rb_node;
>  
>   /* Boilerplate rb_tree code */
>   while (*new) {
> @@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int 
> virt_irq, int phys_irq)
>   new = &(*new)->rb_left;
>   else if (this->virt_irq > virt_irq)
>   new = &(*new)->rb_right;
> - else
> + else {
> + spin_unlock(&dist->rb_tree_lock);
>   return -EEXIST;
> + }

can you initialize a ret variable to -EEXIST in the beginning of this
function, and add an out label above the unlock below, replace this
multi-line statement with a goto out, and set ret = 0 after the while
loop?

>   }
>  
>   new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> - if (!new_map)
> + if (!new_map) {
> + spin_unlock(&dist->rb_tree_lock);
>   return -ENOMEM;

then this becomes ret = -ENOMEM; goto out;

> + }
>  
>   new_map->virt_irq = virt_irq;
>   new_map->phys_irq = phys_irq;
> @@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int 
> virt_irq, int phys_irq)
>   rb_link_node(&new_map->node, parent, new);
>   rb_insert_color(&new_map->node, root);
>  
> + spin_unlock(&dist->rb_tree_lock);
> +

aren't you allocating memory with GFP_KERNEL while holding a spinlock
here?

>   return 0;
>  }
>  
> @@ -1685,24 +1697,39 @@ static struct irq_phys_map 
> *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  
>  int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
>  {
> - struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> + struct irq_phys_map *map;
> + struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> + int ret;
> +
> + spin_lock(&dist->rb_tree_lock);
> + map = vgic_irq_map_search(vcpu, virt_irq);
>  
>   if (map)
> - return map->phys_irq;
> + ret = map->phys_irq;
> + else
> + ret =  -ENOENT;

initialize ret to -ENOENT and avoid the else statement.

> +
> + spin_unlock(&dist->rb_tree_lock);
> + return ret;
>  
> - return -ENOENT;
>  }
>  
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> - struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> + struct irq_phys_map *map;
> + struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> + spin_lock(&dist->rb_tree_lock);
> +
> + map = vgic_irq_map_search(vcpu, virt_irq);
>  
>   if (map && map->phys_irq == phys_irq) {
>   rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
>   kfree(map);
> + spin_unlock(&dist->rb_tree_lock);

can kfree sleep?  I don't remember.  In any case, you can unlock before
calling kfree.

>   return 0;
>   }
> -
> + spin_unlock(&dist->rb_tree_lock);
>   return -ENOENT;

an out label and single unlock location would be preferred here as well
I think.

>  }
>  
> @@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
>   }
>  
>   spin_lock_init(&kvm->arch.vgic.lock);
> + spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
>   kvm->arch.vgic.in_kernel = true;
>   kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>   kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> -- 
> 1.9.1
> 
--
To unsu

Re: [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ

2014-09-10 Thread Christoffer Dall

On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
> Fix multiple injection of level sensitive forwarded IRQs.
> With current code, the second injection fails since the state bitmaps
> are not reset (process_maintenance is not called anymore).
> New implementation consists in fully bypassing the vgic state
> management for forwarded IRQ (checks are ignored in
> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
> injected from kernel side.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
> the emptied LR of forwarded IRQ. However surprisingly this solution does
> not seem to work. Some times, a new forwarded IRQ injection is observed
> while the LR of the previous instance was not observed as empty.

hmmm, concerning.  It would probably have been helpful overall if you
could start by describing the problem with the current implementation in
the commit message, and then explain the fix...

> 
> v1 -> v2:
> - fix vgic state bypass in vgic_queue_hwirq
> ---
>  virt/kvm/arm/vgic.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 0007300..8ef495b 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int 
> irq)
>  
>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>  {
> - if (vgic_irq_is_queued(vcpu, irq))
> + bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);

can you create a static function to factor this vgic_get_phys_irq check out, 
please?

> +
> + if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
>   return true; /* level interrupt, already queued */

so essentially if an IRQ is already on a LR so we shouldn't resample the
line, then we still resample the line if the IRQ is forwarded?

I think you need to explain this, both to me here, and also in the code
by moving the comment following the return statement above the check and
comment this clearly.

>  
>   if (vgic_queue_irq(vcpu, 0, irq)) {
> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, 
> int cpuid,
>   int edge_triggered, level_triggered;
>   int enabled;
>   bool ret = true;
> + bool is_forwarded;
>  
>   spin_lock(&dist->lock);
>  
>   vcpu = kvm_get_vcpu(kvm, cpuid);
> + is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);

use your new function here as well.

> +
>   edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
>   level_triggered = !edge_triggered;
>  
> - if (!vgic_validate_injection(vcpu, irq_num, level)) {
> + if (!is_forwarded &&
> + !vgic_validate_injection(vcpu, irq_num, level)) {

I don't see the rationale here either.  If an IRQ is forwarded, why do
you need to do anything if the condition of the line hasn't changed for
a level-triggered IRQ or if you have a falling edge on an edge-triggered
IRQ (assuming active-HIGH)?

>   ret = false;
>   goto out;
>   }
> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, 
> int cpuid,
>   goto out;
>   }
>  
> - if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> + if (!is_forwarded &&
> + level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {

So here it's making sense for SPIs since you can have an EOIed interrupt
on a CPU that didn't exit the VM yet, and this it's still queued, but
you still need to resample the line to respect other CPUs.  Only, we
ever only target a single CPU for SPIs IIRC (the first in the target
list register) so we have to wait for that CPU to to exit the VM anyhow.

This leads me to believe that, given a fowarded irq, you can only have
XXX situations at this point:

(1) is_queued && target_vcpu_in_vm:
The vcpu should resample this line when it exits the VM, because we
check the LRs for IRQs like this one, so we don't have to do anything
and go to out here.

(2) is_queued && !target_vcpu_in_vm::
You have a bug because you exited the VM which must have done an EOI on
the interrupt, otherwise this function shouldn't have been called!  This
means that we should have cleared the queued state of the interrupt.

(3) !is_queued && whatever:
Set the irq pending bits, so do not goto out.

I'm aware that there's theoretically a race between (1) and (2), but you
should consider target_cpu_in_vm as "it hasn't been through
__kvm_vgic_sync_hwstate() yet" and this should hold.

Tell me where this breaks?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast

2014-09-10 Thread Zhang Haoyu

Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some dealy between
the EOI writing and irq delivery (system bus latency?). So we need to emulate
this behavior. Otherwise, for a guest who haven't register a proper irq handler
, it would stay in the interrupt routine as this irq would be re-injected
immediately after guest enables interrupt. This would lead guest can't move
forward and may miss the possibility to get proper irq handler registered (one
example is windows guest resuming from hibernation).

As there's no way to differ the unhandled irq from new raised ones, this patch
solve this problems by scheduling a delayed work when the count of irq injected
during eoi broadcast exceeds a threshold value. After this patch, the guest can
move a little forward when there's no suitable irq handler in case it may
register one very soon and for guest who has a bad irq detection routine ( such
as note_interrupt() in linux ), this bad irq would be recognized soon as in the
past.

Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
Signed-off-by: Zhang Haoyu 
---
 include/trace/events/kvm.h | 20 ++
 virt/kvm/ioapic.c  | 51 --
 virt/kvm/ioapic.h  |  6 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 908925a..b05f688 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -95,6 +95,26 @@ TRACE_EVENT(kvm_ioapic_set_irq,
  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
+   TP_PROTO(__u64 e),
+   TP_ARGS(e),
+
+   TP_STRUCT__entry(
+   __field(__u64,  e   )
+   ),
+
+   TP_fast_assign(
+   __entry->e  = e;
+   ),
+
+   TP_printk("dst %x vec=%u (%s|%s|%s%s)",
+ (u8)(__entry->e >> 56), (u8)__entry->e,
+ __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->e & (1<<11)) ? "logical" : "physical",
+ (__entry->e & (1<<15)) ? "level" : "edge",
+ (__entry->e & (1<<16)) ? "|masked" : "")
+);
+
 TRACE_EVENT(kvm_msi_set_irq,
TP_PROTO(__u64 address, __u64 data),
TP_ARGS(address, data),
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..a36c4c4 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -405,6 +405,24 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
irq_source_id)
spin_unlock(&ioapic->lock);
 }
 
+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
+{
+   int i;
+   struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
+eoi_inject.work);
+   spin_lock(&ioapic->lock);
+   for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+   union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+   if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
+   continue;
+
+   if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
+   ioapic_service(ioapic, i, false);
+   }
+   spin_unlock(&ioapic->lock);
+}
+
 static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
@@ -435,8 +453,33 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
ent->fields.remote_irr = 0;
-   if (ioapic->irr & (1 << i))
-   ioapic_service(ioapic, i, false);
+   if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
+   ++ioapic->irq_eoi[i];
+   if (ioapic->irq_eoi[i] == 
IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
+   /*
+* Real hardware does not deliver the irq so
+* immediately during eoi broadcast, so we need
+* to emulate this behavior. Otherwise, for
+* guests who has not registered handler of a
+* level irq, this irq would be injected
+* immediately after guest enables interrupt
+* (which happens usually at the end of the
+* common interrupt routine). This would lead
+* guest can't move forward and may miss the
+* possibility to get proper irq handler
+* registered. So we need to give some breath to
+* guest. TODO: 1 is too long?
+*/
+

Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control

2014-09-10 Thread Alex Williamson

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > 
> > This is a new control channel which enables KVM to cooperate with
> > viable VFIO devices.
> > 
> > The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > in addition to a list of groups (kvm_vfio_group). The new
> > infrastructure enables to check the validity of the VFIO device
> > file descriptor, get and hold a reference to it.
> > 
> > The first concrete implemented command is IRQ forward control:
> > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > 
> > It consists in programing the VFIO driver and KVM in a consistent manner
> > so that an optimized IRQ injection/completion is set up. Each
> > kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
> 
> > 
> > The forwarding programmming is architecture specific, embodied by the
> > kvm_arch_set_fwd_state function. Its implementation is given in a
> > separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
> 
> > 
> > The forwarding control modality is enabled by the
> > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > 
> > Signed-off-by: Eric Auger 
> > 
> > ---
> > 
> > v1 -> v2:
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > - original patch file separated into 2 parts: generic part moved in vfio.c
> >   and ARM specific part(kvm_arch_set_fwd_state)
> > ---
> >  include/linux/kvm_host.h |  27 +++
> >  virt/kvm/vfio.c  | 452 
> > ++-
> >  2 files changed, 477 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index a4c33b3..24350dc 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >   unsigned long arg);
> >  };
> >  
> > +enum kvm_fwd_irq_action {
> > +   KVM_VFIO_IRQ_SET_FORWARD,
> > +   KVM_VFIO_IRQ_SET_NORMAL,
> > +   KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.

This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Extra states worry me too.

> > +};
> > +
> > +/* internal structure describing a forwarded IRQ */
> > +struct kvm_fwd_irq {
> > +   struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.

Yes, this is part of the abstraction problem.

> > +   __u32 index; /* platform device irq index */

This is a vfio_device irq_index, but vfio_devices support indexes and
sub-indexes.  At this level the API should match vfio, not the specifics
of platform devices not supporting sub-index.

> > +   __u32 hwirq; /*physical IRQ */
> > +   __u32 gsi; /* virtual IRQ */
> > +   struct kvm_vcpu *vcpu; /* vcpu to inject into*/

Not sure I understand why vcpu is necessary.  Also I see a 'get' in the
code below, but not a 'put'.

> > +};
> > +
> >  void kvm_device_get(struct kvm_device *dev);
> >  void kvm_device_put(struct kvm_device *dev);
> >  struct kvm_device *kvm_device_from_filp(struct file *filp);
> > @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >  extern struct kvm_device_ops kvm_flic_ops;
> >  
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?

p is for pointer?

> > +  enum kvm_fwd_irq_action action);
> > +
> > +#else
> > +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > +enum kvm_fwd_irq_action action)
> > +{
> > +   return 0;
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >  
> >  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool 
> > val)
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 76dc7a1..e4a81c4 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -18,14 +18,24 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  struct kvm_vfio_group {
> > struct list_head node;
> > struct vfio_group *vfio_group;
> >  };
> >  
> > +struct kvm_vfio_device {
> > +   struct list_head node;
> > +   struct vfio_device *vfio_device;
> > +   /* list of forwarded IRQs for that VFIO device */
> > +   struct list_

Re: [RFC v2 0/9] KVM-VFIO IRQ forward control

2014-09-10 Thread Alex Williamson

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > This RFC proposes an integration of "ARM: Forwarding physical
> > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > KVM.
> > > 
> > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > switch can be avoided on guest virtual IRQ completion. Before this
> > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > 
> > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > 
> > > Besides, the injection still is based on irqfd triggering. The only
> > > impact on irqfd process is resamplefd is not called anymore on
> > > virtual IRQ completion since this latter becomes "transparent".
> > > 
> > > The current integration is based on an extension of the KVM-VFIO
> > > device, previously used by KVM to interact with VFIO groups. The
> > > patch serie now enables KVM to directly interact with a VFIO
> > > platform device. The VFIO external API was extended for that purpose.
> > > 
> > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > integrity and type, get the IRQ number associated to an IRQ index.
> > > 
> > > The IRQ forward programming is architecture specific (virtual interrupt
> > > controller programming basically). However the whole infrastructure is
> > > kept generic.
> > > 
> > > from a user point of view, the functionality is provided through new
> > > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > Assignment can only be changed when the physical IRQ is not active.
> > > It is the responsability of the user to do this check.
> > > 
> > > This patch serie has the following dependencies:
> > > - "ARM: Forwarding physical interrupts to a guest VM"
> > >   (http://lwn.net/Articles/603514/) in
> > > - [PATCH v3] irqfd for ARM
> > > - and obviously the VFIO platform driver serie:
> > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > 
> > > Integrated pieces can be found at
> > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > 
> > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > 
> > > v1 -> v2:
> > > - forward control is moved from architecture specific file into generic
> > >   vfio.c module.
> > >   only kvm_arch_set_fwd_state remains architecture specific
> > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > - fix vgic state bypass in vgic_queue_hwirq
> > > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> > >   to include/uapi/linux/kvm.h
> > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > - vfio_external_get_type removed
> > > - kvm_vfio_external_get_base_device renamed into 
> > > kvm_vfio_external_base_device
> > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > 
> > > Eric Auger (8):
> > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > > IRQ
> > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > >   VFIO: Extend external user API
> > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> > > control
> > >   KVM: KVM-VFIO: ARM forwarding control
> > > 
> > > Kim Phillips (1):
> > >   ARM: KVM: Enable the KVM-VFIO device
> > > 
> > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > >  arch/arm/include/asm/kvm_host.h|   7 +
> > >  arch/arm/kvm/Kconfig   |   1 +
> > >  arch/arm/kvm/Makefile  |   4 +-
> > >  arch/arm/kvm/kvm_vfio_arm.c|  85 +
> > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > >  drivers/vfio/vfio.c|  24 ++
> > >  include/kvm/arm_vgic.h |   1 +
> > >  include/linux/kvm_host.h   |  27 ++
> > >  include/linux/vfio.h   |   3 +
> > >  include/uapi/linux/kvm.h   |   9 +
> > >  virt/kvm/arm/vgic.c|  59 +++

[PATCH v5 5/7] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-10 Thread Tang Chen

This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c  | 7 +++
 virt/kvm/kvm_main.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..e7704b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8796,6 +8796,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* Do not call kvm_reload_apic_access_page() because we are now
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_reload_apic_access_page(vcpu->kvm);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8280de..784127e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -214,6 +214,7 @@ void kvm_reload_apic_access_page(struct kvm *kvm)
 {
make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
+EXPORT_SYMBOL_GPL(kvm_reload_apic_access_page);
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 6/7] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-10 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 32 +---
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..70f0d2d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e7704b2..058c373 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(&kvm->slots_lock);
return r;
@@ -4536,9 +4541,16 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
+   }
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
@@ -7994,10 +8006,16 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
vmcs_write64(APIC_ACCESS_ADDR,
  page_to_phys(vmx->nested.apic_access_page));
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96f4188..6da0b93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   page = gfn

[PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-10 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..8be076a 100644
--- a/includ

[PATCH v5 3/7] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-10 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 7/7] kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

2014-09-10 Thread Tang Chen

Just like we removed kvm_arch->apic_access_page, nested_vmx->apic_access_page
becomes useless for the same reason. This patch removes 
nested_vmx->apic_access_page,
and use gfn_to_page() to pin it in memory when we need it, and unpin it after 
then.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 058c373..4aa73cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -374,11 +374,6 @@ struct nested_vmx {
u64 vmcs01_tsc_offset;
/* L2 must run next, and mustn't decide to exit to L1. */
bool nested_run_pending;
-   /*
-* Guest pages referred to in vmcs02 with host-physical pointers, so
-* we must keep them pinned while L2 runs.
-*/
-   struct page *apic_access_page;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -6154,11 +6149,6 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx->nested.current_shadow_vmcs);
-   /* Unpin physical memory we referred to in current vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
 
nested_free_all_saved_vmcss(vmx);
 }
@@ -7983,28 +7973,31 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
exec_control |= vmcs12->secondary_vm_exec_control;
 
if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+   struct page *page;
/*
 * Translate L1 physical address to host physical
 * address for vmcs02. Keep the page pinned, so this
 * physical address remains valid. We keep a reference
 * to it so we can release it later.
 */
-   if (vmx->nested.apic_access_page) /* shouldn't happen */
-   
nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page =
-   nested_get_page(vcpu, vmcs12->apic_access_addr);
+   page = nested_get_page(vcpu, vmcs12->apic_access_addr);
/*
 * If translation failed, no matter: This feature asks
 * to exit when accessing the given address, and if it
 * can never be accessed, this feature won't do
 * anything anyway.
 */
-   if (!vmx->nested.apic_access_page)
+   if (!page)
exec_control &=
  ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
else
vmcs_write64(APIC_ACCESS_ADDR,
- page_to_phys(vmx->nested.apic_access_page));
+page_to_phys(page));
+   /*
+* Do not pin nested vm's apic access page in memory so
+* that memory hotplug process is able to migrate it.
+*/
+   put_page(page);
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
struct page *page = gfn_to_page(vmx->vcpu.kvm,
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
@@ -8807,12 +8800,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
/* This is needed for same reason as it was needed in prepare_vmcs02 */
vmx->host_rsp = 0;
 
-   /* Unpin physical memory we referred to in vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
-
/*
 * Do not call kvm_reload_apic_access_page() because we are now
 * running, mmu_notifier will force to reload the page's hpa for L2
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 2/7] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-10 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86

[PATCH v5 0/7] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-10 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v4 -> v5:
1. Patch 5/7: Call kvm_reload_apic_access_page() unconditionally in 
nested_vmx_vmexit().
   (From Gleb Natapov )
2. Patch 6/7: Remove kvm_arch->apic_access_page. (From Gleb Natapov 
)
3. Patch 7/7: Remove nested_vmx->apic_access_page.

Tang Chen (7):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.
  kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/svm.c  |   9 ++-
 arch/x86/kvm/vmx.c  | 139 ++--
 arch/x86/kvm/x86.c  |  24 +--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 
 6 files changed, 122 insertions(+), 69 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 1/7] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-10 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast

2014-09-10 Thread Jan Kiszka

On 2014-09-11 07:06, Zhang Haoyu wrote:
> Currently, we call ioapic_service() immediately when we find the irq is still
> active during eoi broadcast. But for real hardware, there's some dealy between
> the EOI writing and irq delivery (system bus latency?). So we need to emulate
> this behavior. Otherwise, for a guest who haven't register a proper irq 
> handler
> , it would stay in the interrupt routine as this irq would be re-injected
> immediately after guest enables interrupt. This would lead guest can't move
> forward and may miss the possibility to get proper irq handler registered (one
> example is windows guest resuming from hibernation).
> 
> As there's no way to differ the unhandled irq from new raised ones, this patch
> solve this problems by scheduling a delayed work when the count of irq 
> injected
> during eoi broadcast exceeds a threshold value. After this patch, the guest 
> can
> move a little forward when there's no suitable irq handler in case it may
> register one very soon and for guest who has a bad irq detection routine ( 
> such
> as note_interrupt() in linux ), this bad irq would be recognized soon as in 
> the
> past.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> Signed-off-by: Zhang Haoyu 
> ---
>  include/trace/events/kvm.h | 20 ++
>  virt/kvm/ioapic.c  | 51 
> --
>  virt/kvm/ioapic.h  |  6 ++
>  3 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
> index 908925a..b05f688 100644
> --- a/include/trace/events/kvm.h
> +++ b/include/trace/events/kvm.h
> @@ -95,6 +95,26 @@ TRACE_EVENT(kvm_ioapic_set_irq,
> __entry->coalesced ? " (coalesced)" : "")
>  );
>  
> +TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
> + TP_PROTO(__u64 e),
> + TP_ARGS(e),
> +
> + TP_STRUCT__entry(
> + __field(__u64,  e   )
> + ),
> +
> + TP_fast_assign(
> + __entry->e  = e;
> + ),
> +
> + TP_printk("dst %x vec=%u (%s|%s|%s%s)",
> +   (u8)(__entry->e >> 56), (u8)__entry->e,
> +   __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
> +   (__entry->e & (1<<11)) ? "logical" : "physical",
> +   (__entry->e & (1<<15)) ? "level" : "edge",
> +   (__entry->e & (1<<16)) ? "|masked" : "")
> +);
> +
>  TRACE_EVENT(kvm_msi_set_irq,
>   TP_PROTO(__u64 address, __u64 data),
>   TP_ARGS(address, data),
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index e8ce34c..a36c4c4 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -405,6 +405,24 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
> irq_source_id)
>   spin_unlock(&ioapic->lock);
>  }
>  
> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> +{
> + int i;
> + struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
> +  eoi_inject.work);
> + spin_lock(&ioapic->lock);
> + for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> + union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> +
> + if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
> + continue;
> +
> + if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
> + ioapic_service(ioapic, i, false);
> + }
> + spin_unlock(&ioapic->lock);
> +}
> +
>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>  {
> @@ -435,8 +453,33 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu 
> *vcpu,
>  
>   ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>   ent->fields.remote_irr = 0;
> - if (ioapic->irr & (1 << i))
> - ioapic_service(ioapic, i, false);
> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) {

The mask check is new - why now? You don't check it in the work handler
as well.

> + ++ioapic->irq_eoi[i];
> + if (ioapic->irq_eoi[i] == 
> IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
> + /*
> +  * Real hardware does not deliver the irq so
> +  * immediately during eoi broadcast, so we need
> +  * to emulate this behavior. Otherwise, for
> +  * guests who has not registered handler of a
> +  * level irq, this irq would be injected
> +  * immediately after guest enables interrupt
> +  * (which happens usually at the end of the
> +  * common interrupt routine). This would lead
> +  * guest can't move forward and may mis

Re: [PATCH] KVM: ioapic: add check for RTC_GSI

2014-09-10 Thread Jan Kiszka

On 2014-09-10 18:53, Chris J Arges wrote:
> cppcheck found the following error:
> [ioapic.c:114]: (error) Array index -1 is out of bounds.
> 
> If CONFIG_X86 is not defined, RTC_GSI == -1U which means that an out of bounds

That issue is limited to the dead ia64 architecture, no other builds
ioapic.c (besides x86, of course). Maybe finally remove KVM support for
that arch instead of bloating otherwise correct code?

Jan

> error could occur when accessing &ioapic->redirtbl[RTC_GSI].
> 
> This patch adds a check to kvm_rtc_eoi_tracking_restore_one that is similar to
> how kvm_rtc_eoi_tracking_restore_all checks this condition.
> 
> Signed-off-by: Chris J Arges 
> ---
>  virt/kvm/ioapic.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index e8ce34c..859f5b8 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -136,6 +136,9 @@ void kvm_rtc_eoi_tracking_restore_one(struct kvm_vcpu 
> *vcpu)
>  {
>   struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>  
> + if (RTC_GSI >= IOAPIC_NUM_PINS)
> + return;
> +
>   spin_lock(&ioapic->lock);
>   __rtc_irq_eoi_tracking_restore_one(vcpu);
>   spin_unlock(&ioapic->lock);
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] virtio-rng cleanup: move some code out of mutex protection

2014-09-10 Thread Amit Shah

On (Wed) 10 Sep 2014 [17:07:06], Amos Kong wrote:
> It doesn't save too much cpu time as expected, just a cleanup.

Frankly I won't bother with this.  It doesn't completely remove all
copying from the mutex, so it's not worthwhile.

> Signed-off-by: Amos Kong 
> ---
>  drivers/char/hw_random/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index aa30a25..c591d7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
>   return -ERESTARTSYS;
>   if (current_rng)
>   name = current_rng->name;
> - ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
>   mutex_unlock(&rng_mutex);
> + ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
>  
>   return ret;
>  }
> @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device 
> *dev,
>   ssize_t ret = 0;
>   struct hwrng *rng;
>  
> + buf[0] = '\0';
>   err = mutex_lock_interruptible(&rng_mutex);
>   if (err)
>   return -ERESTARTSYS;
> - buf[0] = '\0';
>   list_for_each_entry(rng, &rng_list, list) {
>   strncat(buf, rng->name, PAGE_SIZE - ret - 1);
>   ret += strlen(rng->name);
>   strncat(buf, " ", PAGE_SIZE - ret - 1);
>   ret++;
>   }
> + mutex_unlock(&rng_mutex);
>   strncat(buf, "\n", PAGE_SIZE - ret - 1);
>   ret++;
> - mutex_unlock(&rng_mutex);
>  
>   return ret;
>  }
> -- 
> 1.9.3
> 


Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes

2014-09-10 Thread Amit Shah

On (Wed) 10 Sep 2014 [17:07:07], Amos Kong wrote:
> When I check hwrng attributes in sysfs, cat process always gets
> stuck if guest has only 1 vcpu and uses a slow rng backend.
> 
> Currently we check if there is any tasks waiting to be run on
> current cpu in rng_dev_read() by need_resched(). But need_resched()
> doesn't work because rng_dev_read() is executing in user context.
> 
> This patch removed need_resched() and increase delay to 10 jiffies,
> then other tasks can have chance to execute protected code.
> Delaying 1 jiffy also works, but 10 jiffies is safer.

I'd prefer two patches for this one: one to remove the need_resched()
check, and the other to increase the timeout.

Anyway,

Reviewed-by: Amit Shah 

> 
> Signed-off-by: Amos Kong 
> ---
>  drivers/char/hw_random/core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index c591d7e..b5d1b6f 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>  
>   mutex_unlock(&rng_mutex);
>  
> - if (need_resched())
> - schedule_timeout_interruptible(1);
> + schedule_timeout_interruptible(10);
>  
>   if (signal_pending(current)) {
>   err = -ERESTARTSYS;
> -- 
> 1.9.3
> 

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial

2014-09-10 Thread Amit Shah

On (Sun) 07 Sep 2014 [17:46:26], Zhang Haoyu wrote:
> Hi, Paolo, Amit,
> any ideas?

I'll check this, thanks for testing with Linux guests.


Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

57 matches

Mail list logo