date:20141120

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Laszlo Ersek

On 11/20/14 00:32, Mario Smarduch wrote:
> Hi Laszlo,
> 
> couple observations.
> 
>  I'm wondering if access from qemu and guest won't
> result in mixed memory attributes and if that's acceptable
> to the CPU.

Normally this would be a problem I think (Jon raised the topic of live
migration). However, for flash programming specifically, I think the
guest's access pattern ensures that we'll see things OK.

When the guest issues the first write access, the memslot is deleted,
and everything is forwarded to qemu, both reads and writes. In response
qemu modifies the array that *otherwise* backs the flash. These
modifications by qemu end up in the dcache mostly. When the guest is
done "programming", it writes a special command (read array mode) at
which point the memslot is recreated (as read-only) and flushed / set up
for flushing during demand paging.

So from the emulated flash POV, the memslot either doesn't exist at all
(and then qemu serves all accesses just fine), or it exists r/o, at
which point qemu (host userspace) will have stopped writing to it, and
will have set it up for flushing before and during guest read accesses.

> Also is if you update memory from qemu you may break
> dirty page logging/migration.

Very probably. Jon said the same thing.

> Unless there is some other way
> you keep track. Of course it may not be applicable in your
> case (i.e. flash unused after boot).

The flash *is* used after boot, because the UEFI runtime variable
services *are* exercised by the guest kernel. However those use the same
access pattern (it's the same set of UEFI services just called by a
different "client").

*Uncoordinated* access from guest and host in parallel will be a big
problem; but we're not that far yet, and we need to get the flash
problem sorted, so that we can at least boot and work on the basic
stuff. The flash programming dance happens to provide coordination; the
flash mode changes (which are equivalent to the teardown and the
recreation of the memslot) can be considered barriers.

I hope this is acceptable for the time being...

Thanks
Laszlo

> 
> - Mario
> 
> On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
>> On 11/17/14 16:29, Paolo Bonzini wrote:
>>>
>>>
>>> On 17/11/2014 15:58, Ard Biesheuvel wrote:
 Readonly memslots are often used to implement emulation of ROMs and
 NOR flashes, in which case the guest may legally map these regions as
 uncached.
 To deal with the incoherency associated with uncached guest mappings,
 treat all readonly memslots as incoherent, and ensure that pages that
 belong to regions tagged as such are flushed to DRAM before being passed
 to the guest.
>>>
>>> On x86, the processor combines the cacheability values from the two
>>> levels of page tables.  Is there no way to do the same on ARM?
>>
>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
>> host) memory attributes.
>>
>> When qemu writes, as part of emulating the flash programming commands,
>> to the RAMBlock that *otherwise* backs the flash range (as a r/o
>> memslot), those writes (from host userspace) tend to end up in dcache.
>>
>> But, when the guest flips back the flash to romd mode, and tries to read
>> back the values from the flash as plain ROM, the dcache is completely
>> bypassed due to the strict stage1 mapping, and the guest goes directly
>> to DRAM.
>>
>> Where qemu's earlier writes are not yet / necessarily visible.
>>
>> Please see my original patch (which was incomplete) in the attachment,
>> it has a very verbose commit message.
>>
>> Anyway, I'll let others explain; they can word it better than I can :)
>>
>> FWIW,
>>
>> Series
>> Reviewed-by: Laszlo Ersek 
>>
>> I ported this series to a 3.17.0+ based kernel, and tested it. It works
>> fine. The ROM-like view of the NOR flash now reflects the previously
>> programmed contents.
>>
>> Series
>> Tested-by: Laszlo Ersek 
>>
>> Thanks!
>> Laszlo
>>
>>
>>
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v1 0/2] Define some VFIO interfaces for VT-d Posted-Interrupts

2014-11-20 Thread Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This patch set does the following things:

- Define a new VFIO group KVM_DEV_VFIO_INTERRUPT and it's attributes
KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ. Qemu can use this interface to
configure VT-d PI when guest updates the interrupt configuration
(MSI/MSI-X configuration).

- Define a new VFIO API: vfio_msi_get_irq(), which can be used by KVM
to get the host irq of the assigned devices. Then KVM can update the
associated IRTE for VT-d PI.

Feng Wu (2):
  vfio: Add new interrupt group for VFIO
  vfio: Add VFIO API vfio_msi_get_irq

 Documentation/virtual/kvm/devices/vfio.txt |8 
 drivers/vfio/pci/vfio_pci.c|   10 ++
 include/linux/vfio.h   |2 ++
 include/uapi/linux/kvm.h   |   14 ++
 4 files changed, 34 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Feng Wu

This API returns the host irq for the MSI/MSI-X interrrupts.

Signed-off-by: Feng Wu 
---
 drivers/vfio/pci/vfio_pci.c |   10 ++
 include/linux/vfio.h|2 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 9558da3..4fb9828 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1009,6 +1009,16 @@ put_devs:
kfree(devs.devices);
 }
 
+unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
msix)
+{
+   struct vfio_pci_device *vdev =
+   (struct vfio_pci_device *)vfio_device_data(device);
+   struct pci_dev *pdev = vdev->pdev;
+
+   return msix ? vdev->msix[vector].vector : pdev->irq + vector;
+}
+EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
+
 static void __exit vfio_pci_cleanup(void)
 {
pci_unregister_driver(&vfio_pci_driver);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index d320411..007ca55 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
 /*
  * External user API
  */
+extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
+ bool msix);
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Feng Wu

Add new group KVM_DEV_VFIO_INTERRUPT and command
KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.

This is used for VT-d Posted-Interrupts setup.

Signed-off-by: Feng Wu 
---
 Documentation/virtual/kvm/devices/vfio.txt |8 
 include/uapi/linux/kvm.h   |   14 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..bd99176 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -13,6 +13,7 @@ VFIO-group is held by KVM.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_INTERRUPT
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_INTERRUPT attributes:
+  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
+VT-d Posted-Interrrupts
+
+For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
+include the needed information for VT-d Posted-Interrupts setup.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6076882..5544fcc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -943,9 +943,23 @@ struct kvm_device_attr {
__u64   addr;   /* userspace address of attr data */
 };
 
+struct virq_info {
+   __u32   index;  /* index of the msi/msix entry */
+   int virq;   /* virq of the interrupt */
+};
+
+struct kvm_posted_intr {
+   __u32   fd; /* file descriptor of the VFIO device */
+   __u32   count;
+   boolmsix;
+   struct virq_info virq_info[0];
+};
+
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_INTERRUPT2
+#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ   1
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Aneesh Kumar K.V

"Suresh E. Warrier"  writes:

> This patch adds trace points in the guest entry and exit code and also
> for exceptions handled by the host in kernel mode - hypercalls and page
> faults. The new events are added to /sys/kernel/debug/tracing/events
> under a new subsystem called kvm_hv.



>   /* Set this explicitly in case thread 0 doesn't have a vcpu */
> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>  
>   vc->vcore_state = VCORE_RUNNING;
>   preempt_disable();
> +
> + trace_kvmppc_run_core(vc, 0);
> +
>   spin_unlock(&vc->lock);

Do we really want to call tracepoint with spin lock held ? Is that a good
thing to do ?. 

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group

2014-11-20 Thread Zhen Lei

The next code fragment "list_for_each_entry" is not depend on "minor". With this
patch, the free of "minor" in "list_for_each_entry" can be reduced, and there is
no functional change.

Signed-off-by: Zhen Lei 
---
 drivers/vfio/vfio.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index f018d8d..737eb468 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
iommu_group *iommu_group)

mutex_lock(&vfio.group_lock);

-   minor = vfio_alloc_group_minor(group);
-   if (minor < 0) {
-   vfio_group_unlock_and_free(group);
-   return ERR_PTR(minor);
-   }
-
/* Did we race creating this group? */
list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
if (tmp->iommu_group == iommu_group) {
vfio_group_get(tmp);
-   vfio_free_group_minor(minor);
vfio_group_unlock_and_free(group);
return tmp;
}
}

+   minor = vfio_alloc_group_minor(group);
+   if (minor < 0) {
+   vfio_group_unlock_and_free(group);
+   return ERR_PTR(minor);
+   }
+
dev = device_create(vfio.class, NULL,
MKDEV(MAJOR(vfio.group_devt), minor),
group, "%d", iommu_group_id(iommu_group));
--
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-20 Thread Christian Borntraeger

Am 10.11.2014 um 22:07 schrieb Linus Torvalds:
[...]
> So before blacklisting any compilers, let's first see if
> 
>  (a) we can actually make it a real rule that we only use ACCESS_ONCE on 
> scalars
>  (b) we can somehow enforce this with a compiler warning/error for mis-uses
> 
> For example, the attached patch works for some cases, but shows how we
> use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even
> close to compiling the whole kernel. But I wonder how painful that
> would be to change.. The places where it complains are actually
> somewhat debatable to begin with, like:
> 
>  - handle_pte_fault(.. pte_t *pte ..):
> 
> entry = ACCESS_ONCE(*pte);
> 
> and the thing is, "pte" is actually possibly an 8-byte entity on
> x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte
> reads.
> 
> So there is a very valid argument for saying "well, you shouldn't do
> that, then", and that we might be better off cleaning up our
> ACCESS_ONCE() uses, than to just blindly blacklist compilers.
> 
> NOTE! I'm not at all advocating the attached patch. I'm sending it out
> white-space damaged on purpose, it's more of a "hey, something like
> this might be the direction we want to go in", with the spinlock.h
> part of the patch also acting as an example of the kind of changes the
> "ACCESS_ONCE() only works on scalars" rule would require.

So I tried to see if I can come up with some results on how often this problem 
happens...

[...]


> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index d5ad7b1118fc..63e82f1dfc1a 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -378,7 +378,11 @@ void ftrace_likely_update(struct
> ftrace_branch_data *f, int val, int expect);
>   * use is to mediate communication between process-level code and irq/NMI
>   * handlers, all running on the same CPU.
>   */
> -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#define get_scalar_volatile_pointer(x) ({ \
> +   typeof(x) *__p = &(x); \
> +   volatile typeof(x) *__vp = __p; \
> +   (void)(long)*__p; __vp; })
> +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

..and just took this patch. On s390 is pretty much clean with allyesconfig
In fact with the siif lock changed only the pte/pmd cases you mentioned trigger 
a compile error:

mm/memory.c: In function 'handle_pte_fault':
mm/memory.c:3203:2: error: aggregate value used where an integer was expected
  entry = ACCESS_ONCE(*pte);

mm/rmap.c: In function 'mm_find_pmd':
mm/rmap.c:584:2: error: aggregate value used where an integer was expected
  pmde = ACCESS_ONCE(*pmd);


Here a barrier() might be a good solution as well, I guess.
On x86 allyesconfig its almost the same.
- we need your spinlock changes (well, something different to make it compile)
- we need to fix pmd and pte
- we have gup_get_pte in arch/x86/mm/gup.c getting a ptep

So It looks like we could make a change to ACCESS_ONCE. Would something like

CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

This would boil down to
Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + 
docu update + comments
Patch2: Change mm/* to barriers
Patch3: Change x86 locks
Patch4: Change x86 gup
Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Makes sense?

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Alexander Graf



On 19.11.14 22:54, Suresh E. Warrier wrote:
> 
> 
> On 11/14/2014 04:56 AM, Alexander Graf wrote:
>>
>>
>>
>>> Am 14.11.2014 um 00:29 schrieb Suresh E. Warrier 
>>> :
>>>
>>> This patch adds trace points in the guest entry and exit code and also
>>> for exceptions handled by the host in kernel mode - hypercalls and page
>>> faults. The new events are added to /sys/kernel/debug/tracing/events
>>> under a new subsystem called kvm_hv.
>>>
>>> Acked-by: Paul Mackerras 
>>> Signed-off-by: Suresh Warrier 
>>> ---
>>> arch/powerpc/kvm/book3s_64_mmu_hv.c |  12 +-
>>> arch/powerpc/kvm/book3s_hv.c|  19 ++
>>> arch/powerpc/kvm/trace_hv.h | 497 
>>> 
>>> 3 files changed, 525 insertions(+), 3 deletions(-)
>>> create mode 100644 arch/powerpc/kvm/trace_hv.h
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
>>> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> index 70feb7b..20cbad1 100644
>>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> @@ -38,6 +38,7 @@
>>> #include 
>>>
>>> #include "book3s_hv_cma.h"
>>> +#include "trace_hv.h"
>>>
>>> /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
>>> #define MAX_LPID_97063
>>> @@ -627,6 +628,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
>>> struct kvm_vcpu *vcpu,
>>>gfn = gpa >> PAGE_SHIFT;
>>>memslot = gfn_to_memslot(kvm, gfn);
>>>
>>> +trace_kvm_page_fault_enter(vcpu, hpte, memslot, ea, dsisr);
>>> +
>>>/* No memslot means it's an emulated MMIO region */
>>>if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
>>>return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
>>> @@ -639,6 +642,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
>>> struct kvm_vcpu *vcpu,
>>>mmu_seq = kvm->mmu_notifier_seq;
>>>smp_rmb();
>>>
>>> +ret = -EFAULT;
>>>is_io = 0;
>>>pfn = 0;
>>>page = NULL;
>>> @@ -662,7 +666,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
>>> struct kvm_vcpu *vcpu,
>>>}
>>>up_read(¤t->mm->mmap_sem);
>>>if (!pfn)
>>> -return -EFAULT;
>>> +goto out_put;
>>>} else {
>>>page = pages[0];
>>>if (PageHuge(page)) {
>>> @@ -690,14 +694,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
>>> struct kvm_vcpu *vcpu,
>>>pfn = page_to_pfn(page);
>>>}
>>>
>>> -ret = -EFAULT;
>>>if (psize > pte_size)
>>>goto out_put;
>>>
>>>/* Check WIMG vs. the actual page we're accessing */
>>>if (!hpte_cache_flags_ok(r, is_io)) {
>>>if (is_io)
>>> -return -EFAULT;
>>> +goto out_put;
>>> +
>>>/*
>>> * Allow guest to map emulated device memory as
>>> * uncacheable, but actually make it cacheable.
>>> @@ -753,6 +757,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
>>> struct kvm_vcpu *vcpu,
>>>SetPageDirty(page);
>>>
>>>  out_put:
>>> +trace_kvm_page_fault_exit(vcpu, hpte, ret);
>>> +
>>>if (page) {
>>>/*
>>> * We drop pages[0] here, not page because page might
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 69d4085..5143d17 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -57,6 +57,9 @@
>>>
>>> #include "book3s.h"
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include "trace_hv.h"
>>> +
>>> /* #define EXIT_DEBUG */
>>> /* #define EXIT_DEBUG_SIMPLE */
>>> /* #define EXIT_DEBUG_INT */
>>> @@ -1679,6 +1682,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>>>list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
>>>kvmppc_start_thread(vcpu);
>>>kvmppc_create_dtl_entry(vcpu, vc);
>>> +trace_kvm_guest_enter(vcpu);
>>>}
>>>
>>>/* Set this explicitly in case thread 0 doesn't have a vcpu */
>>> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>>>
>>>vc->vcore_state = VCORE_RUNNING;
>>>preempt_disable();
>>> +
>>> +trace_kvmppc_run_core(vc, 0);
>>> +
>>>spin_unlock(&vc->lock);
>>>
>>>kvm_guest_enter();
>>> @@ -1732,6 +1739,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>>>kvmppc_core_pending_dec(vcpu))
>>>kvmppc_core_dequeue_dec(vcpu);
>>>
>>> +trace_kvm_guest_exit(vcpu);
>>> +
>>>ret = RESUME_GUEST;
>>>if (vcpu->arch.trap)
>>>ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
>>> @@ -1757,6 +1766,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>>>wake_up(&vcpu->arch.cpu_run);
>>>}
>>>}
>>> +
>>> +trace_kvmppc_run_core(vc, 1);
>>> }
>>>
>>> /*
>>> @@ -1783,11 +1794,13 @@ static void kvmppc_vcore_blocked(struct 
>>> kvmppc_vcore *vc)
>>>
>>>prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
>>>vc->vcore_state = VCORE_SLEEPING;
>>> +trace_kvmppc_vcore_blocked(vc, 0);
>>>spin_unlock(&vc->lock);
>>>schedule();
>>>fini

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Alexander Graf



On 20.11.14 11:40, Aneesh Kumar K.V wrote:
> "Suresh E. Warrier"  writes:
> 
>> This patch adds trace points in the guest entry and exit code and also
>> for exceptions handled by the host in kernel mode - hypercalls and page
>> faults. The new events are added to /sys/kernel/debug/tracing/events
>> under a new subsystem called kvm_hv.
> 
> 
> 
>>  /* Set this explicitly in case thread 0 doesn't have a vcpu */
>> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
>>  
>>  vc->vcore_state = VCORE_RUNNING;
>>  preempt_disable();
>> +
>> +trace_kvmppc_run_core(vc, 0);
>> +
>>  spin_unlock(&vc->lock);
> 
> Do we really want to call tracepoint with spin lock held ? Is that a good
> thing to do ?. 

I thought it was safe to call tracepoints inside of spin lock regions?
Steve?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ia64: remove

2014-11-20 Thread Radim Krčmář

2014-11-19 22:05+0100, Paolo Bonzini:
> KVM for ia64 has been marked as broken not just once, but twice even,
> and the last patch from the maintainer is now roughly 5 years old.
> Time for it to rest in piece.
> 
> Signed-off-by: Paolo Bonzini 
> ---

Nice, if only every diffstat was like that!

I propose another removal.
(The reasoning below wasn't confirmed with ia64 compiler.
 I'd remove the ioctls even if they worked.)

---8<---
KVM: remove buggy ia64 specific ioctls

IA64 is no longer present so new applications shouldn't use them.

The main problem is that they most likely didn't work even before,
because we have misused ioctl

  #define KVM_SET_GUEST_DEBUG   _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
  #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)

as

  struct kvm_guest_debug {
__u32 control;
__u32 pad;
struct kvm_guest_debug_arch arch;
  };

and

  struct kvm_guest_debug_arch {
  };

mean that

  sizeof(struct kvm_guest_debug) == sizeof(void *) == 8

thus

  KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK

and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK.

Signed-off-by: Radim Krčmář 
---
 include/uapi/linux/kvm.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6d59e5b..a37fd12 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping {
 #define KVM_X86_SETUP_MCE _IOW(KVMIO,  0x9c, __u64)
 #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO,  0x9d, __u64)
 #define KVM_X86_SET_MCE   _IOW(KVMIO,  0x9e, struct kvm_x86_mce)
-/* IA64 stack access */
-#define KVM_IA64_VCPU_GET_STACK   _IOR(KVMIO,  0x9a, void *)
-#define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
 /* Available with KVM_CAP_VCPU_EVENTS */
 #define KVM_GET_VCPU_EVENTS   _IOR(KVMIO,  0x9f, struct kvm_vcpu_events)
 #define KVM_SET_VCPU_EVENTS   _IOW(KVMIO,  0xa0, struct kvm_vcpu_events)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-20 Thread Paolo Bonzini

ia64 does not need them anymore.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_host.h   | 16 
 arch/x86/kvm/Makefile |  5 ++---
 {virt => arch/x86}/kvm/ioapic.c   |  0
 {virt => arch/x86}/kvm/ioapic.h   |  1 -
 {virt => arch/x86}/kvm/irq_comm.c |  4 ++--
 arch/x86/kvm/x86.c|  1 +
 include/linux/kvm_host.h  | 22 --
 virt/kvm/eventfd.c|  7 ---
 virt/kvm/kvm_main.c   |  3 ---
 9 files changed, 29 insertions(+), 30 deletions(-)
 rename {virt => arch/x86}/kvm/ioapic.c (100%)
 rename {virt => arch/x86}/kvm/ioapic.h (98%)
 rename {virt => arch/x86}/kvm/irq_comm.c (98%)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 769db36a3001..76ff3e2d8fd2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -603,6 +603,9 @@ struct kvm_arch {
 
struct kvm_xen_hvm_config xen_hvm_config;
 
+   /* reads protected by irq_srcu, writes by irq_lock */
+   struct hlist_head mask_notifier_list;
+
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
@@ -819,6 +822,19 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
+struct kvm_irq_mask_notifier {
+   void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked);
+   int irq;
+   struct hlist_node link;
+};
+
+void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
+   struct kvm_irq_mask_notifier *kimn);
+void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn);
+void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
+bool mask);
+
 extern bool tdp_enabled;
 
 u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2d6509..ee1cd92b03be 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,14 +7,13 @@ CFLAGS_vmx.o := -I.
 
 KVM := ../../../virt/kvm
 
-kvm-y  += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
-   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
+kvm-y  += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o cpuid.o pmu.o
+  i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/virt/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
similarity index 100%
rename from virt/kvm/ioapic.c
rename to arch/x86/kvm/ioapic.c
diff --git a/virt/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
similarity index 98%
rename from virt/kvm/ioapic.h
rename to arch/x86/kvm/ioapic.h
index dc3baa3a538f..deac8d509f2a 100644
--- a/virt/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -96,7 +96,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
 void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
u32 *tmr);
 
diff --git a/virt/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
similarity index 98%
rename from virt/kvm/irq_comm.c
rename to arch/x86/kvm/irq_comm.c
index 1345bde064f5..e9c135b639aa 100644
--- a/virt/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -234,7 +234,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int 
irq,
 {
mutex_lock(&kvm->irq_lock);
kimn->irq = irq;
-   hlist_add_head_rcu(&kimn->link, &kvm->mask_notifier_list);
+   hlist_add_head_rcu(&kimn->link, &kvm->arch.mask_notifier_list);
mutex_unlock(&kvm->irq_lock);
 }
 
@@ -256,7 +256,7 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned 
irqchip, unsigned pin,
idx = srcu_read_lock(&kvm->irq_srcu);
gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
if (gsi != -1)
-   hlist_for_each_entry_rcu(kimn, &kvm->mask_notifier_list, link)
+   hlist_for_each_entry_rcu(kimn, &kvm->arch.mask_notifier_list, 
link)
if (kimn->irq == gsi)
kimn->func(kimn, mask);
srcu_read_unlock(&kvm->irq_srcu, idx);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a8f53a6960fd..5337039427c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c

Re: [PATCH] KVM: ia64: remove

2014-11-20 Thread Paolo Bonzini



On 20/11/2014 14:38, Radim Krčmář wrote:
> I propose another removal.
> (The reasoning below wasn't confirmed with ia64 compiler.
> I'd remove the ioctls even if they worked.)

Checked and applied, thanks.

Paolo

> ---8<---
> KVM: remove buggy ia64 specific ioctls
> 
> IA64 is no longer present so new applications shouldn't use them.
> 
> The main problem is that they most likely didn't work even before,
> because we have misused ioctl
> 
>   #define KVM_SET_GUEST_DEBUG   _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
>   #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
> 
> as
> 
>   struct kvm_guest_debug {
>   __u32 control;
>   __u32 pad;
>   struct kvm_guest_debug_arch arch;
>   };
> 
> and
> 
>   struct kvm_guest_debug_arch {
>   };
> 
> mean that
> 
>   sizeof(struct kvm_guest_debug) == sizeof(void *) == 8
> 
> thus
> 
>   KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK
> 
> and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  include/uapi/linux/kvm.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6d59e5b..a37fd12 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_X86_SETUP_MCE _IOW(KVMIO,  0x9c, __u64)
>  #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO,  0x9d, __u64)
>  #define KVM_X86_SET_MCE   _IOW(KVMIO,  0x9e, struct kvm_x86_mce)
> -/* IA64 stack access */
> -#define KVM_IA64_VCPU_GET_STACK   _IOR(KVMIO,  0x9a, void *)
> -#define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
>  /* Available with KVM_CAP_VCPU_EVENTS */
>  #define KVM_GET_VCPU_EVENTS   _IOR(KVMIO,  0x9f, struct kvm_vcpu_events)
>  #define KVM_SET_VCPU_EVENTS   _IOW(KVMIO,  0xa0, struct kvm_vcpu_events)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Steven Rostedt

On Thu, 20 Nov 2014 13:10:12 +0100
Alexander Graf  wrote:

> 
> 
> On 20.11.14 11:40, Aneesh Kumar K.V wrote:
> > "Suresh E. Warrier"  writes:
> > 
> >> This patch adds trace points in the guest entry and exit code and also
> >> for exceptions handled by the host in kernel mode - hypercalls and page
> >> faults. The new events are added to /sys/kernel/debug/tracing/events
> >> under a new subsystem called kvm_hv.
> > 
> > 
> > 
> >>/* Set this explicitly in case thread 0 doesn't have a vcpu */
> >> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
> >>  
> >>vc->vcore_state = VCORE_RUNNING;
> >>preempt_disable();
> >> +
> >> +  trace_kvmppc_run_core(vc, 0);
> >> +
> >>spin_unlock(&vc->lock);
> > 
> > Do we really want to call tracepoint with spin lock held ? Is that a good
> > thing to do ?. 
> 
> I thought it was safe to call tracepoints inside of spin lock regions?
> Steve?
> 

There's tracepoints in the guts of the scheduler where rq lock is held.
Don't worry about it. The tracing system is lockless.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices

2014-11-20 Thread Antonios Motakis

On Wed, Nov 12, 2014 at 5:49 PM, Alex Williamson
 wrote:
>
> On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote:
> > Hi Antonios,
> >
> > On 10/27/2014 07:07 PM, Antonios Motakis wrote:
> > > This patch forms the common skeleton code for platform devices support
> > > with VFIO. This will include the core functionality of VFIO_PLATFORM,
> > > however binding to the device and discovering the device resources will
> > > be done with the help of a separate file where any Linux platform bus
> > > specific code will reside.
> > >
> > > This will allow us to implement support for also discovering AMBA devices
> > > and their resources, but still reuse a large part of the VFIO_PLATFORM
> > > implementation.
> > >
> > > Signed-off-by: Antonios Motakis 
> > > ---
> > >  drivers/vfio/platform/vfio_platform_common.c  | 126 
> > > ++
> > >  drivers/vfio/platform/vfio_platform_private.h |  36 
> > >  2 files changed, 162 insertions(+)
> > >  create mode 100644 drivers/vfio/platform/vfio_platform_common.c
> > >  create mode 100644 drivers/vfio/platform/vfio_platform_private.h
> > >
> > > diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> > > b/drivers/vfio/platform/vfio_platform_common.c
> > > new file mode 100644
> > > index 000..e0fdbc8
> > > --- /dev/null
> > > +++ b/drivers/vfio/platform/vfio_platform_common.c
> > > @@ -0,0 +1,126 @@
> > > +/*
> > > + * Copyright (C) 2013 - Virtual Open Systems
> > > + * Author: Antonios Motakis 
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License, version 2, as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * This program is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > not sure at that state all the above includes are needed.
> > > +
> > > +#include "vfio_platform_private.h"
> > > +
> > > +static void vfio_platform_release(void *device_data)
> > > +{
> > > +   module_put(THIS_MODULE);
> > > +}
> > > +
> > > +static int vfio_platform_open(void *device_data)
> > > +{
> > > +   if (!try_module_get(THIS_MODULE))
> > > +   return -ENODEV;
> > > +
> > > +   return 0;
> > > +}
> > > +
> > > +static long vfio_platform_ioctl(void *device_data,
> > > +  unsigned int cmd, unsigned long arg)
> > a minor style comment/question that applies on all the series. Shouldn't
> > subsequent argument lines rather be aligned with the first char after
> > "(", as done in PCI code?
>
> It's also my preferred style to indent to just after the open paren on
> wrapped lines where possible, but I don't think there are hard rules in
> CodingStyle or checkpatch that enforce this, so I often let it slide.
> Thanks,
>

You're right that there are no hard coding style rules for this, but I
also like this style so I'll apply it more consistently.

> Alex
>
> > > +{
> > > +   if (cmd == VFIO_DEVICE_GET_INFO)
> > > +   return -EINVAL;
> > > +
> > > +   else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
> > > +   return -EINVAL;
> > > +
> > > +   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
> > > +   return -EINVAL;
> > > +
> > > +   else if (cmd == VFIO_DEVICE_SET_IRQS)
> > > +   return -EINVAL;
> > > +
> > > +   else if (cmd == VFIO_DEVICE_RESET)
> > > +   return -EINVAL;
> > > +
> > > +   return -ENOTTY;
> > > +}
> > > +
> > > +static ssize_t vfio_platform_read(void *device_data, char __user *buf,
> > > + size_t count, loff_t *ppos)
> > > +{
> > > +   return -EINVAL;
> > > +}
> > > +
> > > +static ssize_t vfio_platform_write(void *device_data, const char __user 
> > > *buf,
> > > +  size_t count, loff_t *ppos)
> > > +{
> > > +   return -EINVAL;
> > > +}
> > > +
> > > +static int vfio_platform_mmap(void *device_data, struct vm_area_struct 
> > > *vma)
> > > +{
> > > +   return -EINVAL;
> > > +}
> > > +
> > > +static const struct vfio_device_ops vfio_platform_ops = {
> > > +   .name   = "vfio-platform",
> > > +   .open   = vfio_platform_open,
> > > +   .release= vfio_platform_release,
> > > +   .ioctl  = vfio_platform_ioctl,
> > > +   .read   = vfio_platform_read,
> > > +   .write  = vfio_platform_write,
> > > +   .mmap   = vfio_platform_mmap,
> > > +};
> > > +
> > > +int vfio_platform_probe_common(struct vfio_platform_device *vdev,
> > > +  struct device *dev)
> > > +{
> > > +   struct iommu_group *group;
> > > +   int ret;
> > > +
> >

Re: [PATCH v9 06/19] vfio/platform: return info for bound device

2014-11-20 Thread Antonios Motakis

On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson
 wrote:
> On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote:
>> On 10/27/2014 07:07 PM, Antonios Motakis wrote:
>> > A VFIO userspace driver will start by opening the VFIO device
>> > that corresponds to an IOMMU group, and will use the ioctl interface
>> > to get the basic device info, such as number of memory regions and
>> > interrupts, and their properties. This patch enables the
>> > VFIO_DEVICE_GET_INFO ioctl call.
>> >
>> > Signed-off-by: Antonios Motakis 
>> > ---
>> >  drivers/vfio/platform/vfio_platform_common.c | 23 ---
>> >  1 file changed, 20 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>> > b/drivers/vfio/platform/vfio_platform_common.c
>> > index e0fdbc8..cb20526 100644
>> > --- a/drivers/vfio/platform/vfio_platform_common.c
>> > +++ b/drivers/vfio/platform/vfio_platform_common.c
>> > @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
>> >  static long vfio_platform_ioctl(void *device_data,
>> >unsigned int cmd, unsigned long arg)
>> >  {
>> > -   if (cmd == VFIO_DEVICE_GET_INFO)
>> > -   return -EINVAL;
>> > +   struct vfio_platform_device *vdev = device_data;
>> > +   unsigned long minsz;
>> > +
>> > +   if (cmd == VFIO_DEVICE_GET_INFO) {
>> > +   struct vfio_device_info info;
>> > +
>> > +   minsz = offsetofend(struct vfio_device_info, num_irqs);
>> > +
>> > +   if (copy_from_user(&info, (void __user *)arg, minsz))
>> > +   return -EFAULT;
>> > +
>> > +   if (info.argsz < minsz)
>> > +   return -EINVAL;
>> > +
>> > +   info.flags = vdev->flags;
>> > +   info.num_regions = 0;
>> > +   info.num_irqs = 0;
>> Seems a bit weird to me to enable the modality but returning zeroed
>> values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO
>> and VFIO_DEVICE_GET_IRQ_INFO ones?
>
> I actually like how Antonios has started from a base framework, exposing
> a device but none of the resources and then incrementally adds each
> component.  It's also a good showcase of the VFIO ABI that we can do
> things like this.  Thanks,

I also agree with Alex with this. But of course I'm not married with
any particular splitting style, in case we decide to change this.

>
> Alex
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-20 Thread Antonios Motakis

On Thu, Nov 13, 2014 at 9:05 AM, Hongbo Zhang  wrote:
> On 12 November 2014 17:57, Antonios Motakis
>  wrote:
>> Hello Hongbo,
>>
>> On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang  
>> wrote:
>>> On 28 October 2014 02:07, Antonios Motakis
>>>  wrote:

 Enable building the VFIO PLATFORM driver that allows to use Linux platform
 devices with VFIO.

 Signed-off-by: Antonios Motakis 
 ---
  drivers/vfio/Kconfig   | 1 +
  drivers/vfio/Makefile  | 1 +
  drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
  4 files changed, 15 insertions(+)
  create mode 100644 drivers/vfio/platform/Kconfig
  create mode 100644 drivers/vfio/platform/Makefile

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index a0abe04..962fb80 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -27,3 +27,4 @@ menuconfig VFIO
   If you don't know what to do here, say N.

  source "drivers/vfio/pci/Kconfig"
 +source "drivers/vfio/platform/Kconfig"
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 0b035b1..dadf0ca 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 +obj-$(CONFIG_VFIO_PLATFORM) += platform/
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 new file mode 100644
 index 000..c51af17
 --- /dev/null
 +++ b/drivers/vfio/platform/Kconfig
 @@ -0,0 +1,9 @@
 +config VFIO_PLATFORM
 +   tristate "VFIO support for platform devices"
 +   depends on VFIO && EVENTFD && ARM
>>>
>>> Hi Antonios,
>>> Is this only for ARM? how about X86 and PowerPC?
>>> On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
>>> Access Management Unit), and I am trying to use this VFIO framework on
>>> it.
>>>
>>
>> In principle it should be working on any platform with such devices;
>> as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use
>> VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU).
>>
>
> Antonios,
> As far as you know, on which ARM platform can I apply your patches directly?
> My purpose is to apply you patches[1], and then implement a user space
> driver to evaluate the performance.
>

In principle, if your target has a working IOMMU in front of a
platform or AMBA device, then you should be able to use this.

In practice, I have tested this on various fast models, and less
extensively on Arndale in the past. Linaro as far as I know has tested
this series on Calxeda devices with an xgmac NIC.

> [1]  It is better without manually merging conflicts/dependencies etc,
> I am vfio-platform user, not a iommu expert.
>
>> So if you have a suitable IOMMU driver for your target, feel free to
>> test it, and let us know of the results.
>>

 +   help
 + Support for platform devices with VFIO. This is required to make
 + use of platform devices present on the system using the VFIO
 + framework.
 +
 + If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile 
 b/drivers/vfio/platform/Makefile
 new file mode 100644
 index 000..279862b
 --- /dev/null
 +++ b/drivers/vfio/platform/Makefile
 @@ -0,0 +1,4 @@
 +
 +vfio-platform-y := vfio_platform.o vfio_platform_common.o
 +
 +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 --
 2.1.1

 --
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 06/19] vfio/platform: return info for bound device

2014-11-20 Thread Eric Auger

On 11/20/2014 03:10 PM, Antonios Motakis wrote:
> On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson
>  wrote:
>> On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote:
>>> On 10/27/2014 07:07 PM, Antonios Motakis wrote:
 A VFIO userspace driver will start by opening the VFIO device
 that corresponds to an IOMMU group, and will use the ioctl interface
 to get the basic device info, such as number of memory regions and
 interrupts, and their properties. This patch enables the
 VFIO_DEVICE_GET_INFO ioctl call.

 Signed-off-by: Antonios Motakis 
 ---
  drivers/vfio/platform/vfio_platform_common.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

 diff --git a/drivers/vfio/platform/vfio_platform_common.c 
 b/drivers/vfio/platform/vfio_platform_common.c
 index e0fdbc8..cb20526 100644
 --- a/drivers/vfio/platform/vfio_platform_common.c
 +++ b/drivers/vfio/platform/vfio_platform_common.c
 @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
  static long vfio_platform_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
  {
 -   if (cmd == VFIO_DEVICE_GET_INFO)
 -   return -EINVAL;
 +   struct vfio_platform_device *vdev = device_data;
 +   unsigned long minsz;
 +
 +   if (cmd == VFIO_DEVICE_GET_INFO) {
 +   struct vfio_device_info info;
 +
 +   minsz = offsetofend(struct vfio_device_info, num_irqs);
 +
 +   if (copy_from_user(&info, (void __user *)arg, minsz))
 +   return -EFAULT;
 +
 +   if (info.argsz < minsz)
 +   return -EINVAL;
 +
 +   info.flags = vdev->flags;
 +   info.num_regions = 0;
 +   info.num_irqs = 0;
>>> Seems a bit weird to me to enable the modality but returning zeroed
>>> values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO
>>> and VFIO_DEVICE_GET_IRQ_INFO ones?
>>
>> I actually like how Antonios has started from a base framework, exposing
>> a device but none of the resources and then incrementally adds each
>> component.  It's also a good showcase of the VFIO ABI that we can do
>> things like this.  Thanks,
> 
> I also agree with Alex with this. But of course I'm not married with
> any particular splitting style, in case we decide to change this.

Hi Antonios,
please keep as is. I also learn each day about splitting style ;-)
Best Regards
Eric
> 
>>
>> Alex
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

2014-11-20 Thread Anup Patel

On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
 wrote:
> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> Hi All,
>>
>> I have second thoughts about rebasing KVM PMU patches
>> to Marc's irq-forwarding patches.
>>
>> The PMU IRQs (when virtualized by KVM) are not exactly
>> forwarded IRQs because they are shared between Host
>> and Guest.
>>
>> Scenario1
>> -
>>
>> We might have perf running on Host and no KVM guest
>> running. In this scenario, we wont get interrupts on Host
>> because the kvm_pmu_hyp_init() (similar to the function
>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> implementation) has put all host PMU IRQs in forwarding
>> mode.
>>
>> The only way solve this problem is to not set forwarding
>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> have special routines to turn on and turn off the forwarding
>> mode of PMU IRQs. These routines will be called from
>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> forwarding state.
>>
>> Scenario2
>> -
>>
>> We might have perf running on Host and Guest simultaneously
>> which means it is quite likely that PMU HW trigger IRQ meant
>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> of Marc's patchset which is called before local_irq_enable()).
>>
>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> will accidentally forward IRQ meant for Host to Guest unless
>> we put additional checks to inspect VCPU PMU state.
>>
>> Am I missing any detail about IRQ forwarding for above
>> scenarios?
>>
> Hi Anup,

Hi Christoffer,

>
> I briefly discussed this with Marc.  What I don't understand is how it
> would be possible to get an interrupt for the host while running the
> guest?
>
> The rationale behind my question is that whenever you're running the
> guest, the PMU should be programmed exclusively with guest state, and
> since the PMU is per core, any interrupts should be for the guest, where
> it would always be pending.

Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.

Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ) which means we will return from the
"ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.

This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.

>
> When migrating a VM with a pending PMU interrupt away for a CPU core, we
> also capture the active state (the forwarding patches already handle
> this), and obviously the PMU state along with it.

Yes, the migration of PMU state and PMU interrupt state is
quite clear.

>
> Does this address your concern?

I hope above description give you idea about the concern
raised by me.

>
> -Christoffer

Regards,
Anup
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group

2014-11-20 Thread Alex Williamson

On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote:
> The next code fragment "list_for_each_entry" is not depend on "minor". With 
> this
> patch, the free of "minor" in "list_for_each_entry" can be reduced, and there 
> is
> no functional change.

A reasonable micro-optimization, but I'm curious if you're actually
seeing some measurable overhead from this.  It seems like we'd need to
have multiple devices, all within the same IOMMU group, all probed by
vfio-pci at the same time to exercise the race condition.  Thanks,

Alex

> Signed-off-by: Zhen Lei 
> ---
>  drivers/vfio/vfio.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index f018d8d..737eb468 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
> iommu_group *iommu_group)
> 
>   mutex_lock(&vfio.group_lock);
> 
> - minor = vfio_alloc_group_minor(group);
> - if (minor < 0) {
> - vfio_group_unlock_and_free(group);
> - return ERR_PTR(minor);
> - }
> -
>   /* Did we race creating this group? */
>   list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
>   if (tmp->iommu_group == iommu_group) {
>   vfio_group_get(tmp);
> - vfio_free_group_minor(minor);
>   vfio_group_unlock_and_free(group);
>   return tmp;
>   }
>   }
> 
> + minor = vfio_alloc_group_minor(group);
> + if (minor < 0) {
> + vfio_group_unlock_and_free(group);
> + return ERR_PTR(minor);
> + }
> +
>   dev = device_create(vfio.class, NULL,
>   MKDEV(MAJOR(vfio.group_devt), minor),
>   group, "%d", iommu_group_id(iommu_group));
> --
> 1.8.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Alex Williamson

On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
> Add new group KVM_DEV_VFIO_INTERRUPT and command
> KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.
> 
> This is used for VT-d Posted-Interrupts setup.

Eric proposed an interface for ARM forwarded interrupts[1] using group
KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and
KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ.  Why are we proposing yet another
group and attributes here?  Why can't we re-use the ones Eric proposes?

[1] https://lkml.org/lkml/2014/8/25/258

> Signed-off-by: Feng Wu 
> ---
>  Documentation/virtual/kvm/devices/vfio.txt |8 
>  include/uapi/linux/kvm.h   |   14 ++
>  2 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
> b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..bd99176 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>  
>  Groups:
>KVM_DEV_VFIO_GROUP
> +  KVM_DEV_VFIO_INTERRUPT
>  
>  KVM_DEV_VFIO_GROUP attributes:
>KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
>  
>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>  for the VFIO group.
> +
> +KVM_DEV_VFIO_INTERRUPT attributes:
> +  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
> +VT-d Posted-Interrrupts
> +
> +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
> +include the needed information for VT-d Posted-Interrupts setup.
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6076882..5544fcc 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -943,9 +943,23 @@ struct kvm_device_attr {
>   __u64   addr;   /* userspace address of attr data */
>  };
>  
> +struct virq_info {
> + __u32   index;  /* index of the msi/msix entry */
> + int virq;   /* virq of the interrupt */
> +};
> +
> +struct kvm_posted_intr {
> + __u32   fd; /* file descriptor of the VFIO device */
> + __u32   count;
> + boolmsix;

Note that MSI-X (as opposed to MSI) is a PCI concept.  Being a VFIO
interface this should operate at VFIO IRQ index and sub-index.

> + struct virq_info virq_info[0];
> +};
> +
>  #define  KVM_DEV_VFIO_GROUP  1
>  #define   KVM_DEV_VFIO_GROUP_ADD 1
>  #define   KVM_DEV_VFIO_GROUP_DEL 2
> +#define  KVM_DEV_VFIO_INTERRUPT  2
> +#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1
>  
>  enum kvm_device_type {
>   KVM_DEV_TYPE_FSL_MPIC_20= 1,



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Eric Auger

On 11/20/2014 04:53 PM, Alex Williamson wrote:
> On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
>> Add new group KVM_DEV_VFIO_INTERRUPT and command
>> KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.
>>
>> This is used for VT-d Posted-Interrupts setup.
> 
> Eric proposed an interface for ARM forwarded interrupts[1] using group
> KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and
> KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ.  Why are we proposing yet another
> group and attributes here?  Why can't we re-use the ones Eric proposes?

Hi Alex, Feng

I share your point of view about the KVM_DEV_VFIO_DEVICE group. For the
attribute (renamed KVM_DEV_VFIO_DEVICE_FORWARD_IRQ in RFC v2) the issue
is I specify the kvm_device_attr.addr points to a kvm_arch_forwarded_irq
struct. Feng needs another struct - kvm_posted_intr -. An alternative is
to merge both structs if it makes sense.

Best Regards

Eric
> 
> [1] https://lkml.org/lkml/2014/8/25/258
> 
>> Signed-off-by: Feng Wu 
>> ---
>>  Documentation/virtual/kvm/devices/vfio.txt |8 
>>  include/uapi/linux/kvm.h   |   14 ++
>>  2 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
>> b/Documentation/virtual/kvm/devices/vfio.txt
>> index ef51740..bd99176 100644
>> --- a/Documentation/virtual/kvm/devices/vfio.txt
>> +++ b/Documentation/virtual/kvm/devices/vfio.txt
>> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>>  
>>  Groups:
>>KVM_DEV_VFIO_GROUP
>> +  KVM_DEV_VFIO_INTERRUPT
>>  
>>  KVM_DEV_VFIO_GROUP attributes:
>>KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
>> @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
>>  
>>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>>  for the VFIO group.
>> +
>> +KVM_DEV_VFIO_INTERRUPT attributes:
>> +  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
>> +VT-d Posted-Interrrupts
>> +
>> +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
>> +include the needed information for VT-d Posted-Interrupts setup.
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6076882..5544fcc 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -943,9 +943,23 @@ struct kvm_device_attr {
>>  __u64   addr;   /* userspace address of attr data */
>>  };
>>  
>> +struct virq_info {
>> +__u32   index;  /* index of the msi/msix entry */
>> +int virq;   /* virq of the interrupt */
>> +};
>> +
>> +struct kvm_posted_intr {
>> +__u32   fd; /* file descriptor of the VFIO device */
>> +__u32   count;
>> +boolmsix;
> 
> Note that MSI-X (as opposed to MSI) is a PCI concept.  Being a VFIO
> interface this should operate at VFIO IRQ index and sub-index.
> 
>> +struct virq_info virq_info[0];
>> +};
>> +
>>  #define  KVM_DEV_VFIO_GROUP 1
>>  #define   KVM_DEV_VFIO_GROUP_ADD1
>>  #define   KVM_DEV_VFIO_GROUP_DEL2
>> +#define  KVM_DEV_VFIO_INTERRUPT 2
>> +#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ1
>>  
>>  enum kvm_device_type {
>>  KVM_DEV_TYPE_FSL_MPIC_20= 1,
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Alex Williamson

On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
> This API returns the host irq for the MSI/MSI-X interrrupts.
> 
> Signed-off-by: Feng Wu 
> ---
>  drivers/vfio/pci/vfio_pci.c |   10 ++
>  include/linux/vfio.h|2 ++
>  2 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 9558da3..4fb9828 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1009,6 +1009,16 @@ put_devs:
>   kfree(devs.devices);
>  }
>  
> +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
> msix)
> +{
> + struct vfio_pci_device *vdev =
> + (struct vfio_pci_device *)vfio_device_data(device);
> + struct pci_dev *pdev = vdev->pdev;
> +
> + return msix ? vdev->msix[vector].vector : pdev->irq + vector;
> +}
> +EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
> +

It's difficult to see what you're going to do with this, but it's making
the assumption that the vfio_device is a vfio_pci_device, which is the
wrong starting point.  Eric proposed[1] some VFIO external user
interface extensions, that have since been refined in discussion that
allows the external user (KVM in this case) access to the struct device
where it can get the IRQ vectors itself.  We either need to follow that
same path for posted interrupts, teaching KVM about struct pci_device in
order to extract the IRQ info, or we need to create another vfio
external user interface extension that can return IRQ info regardless of
the device type.  Thanks,

Alex


[1] https://lkml.org/lkml/2014/8/25/257

>  static void __exit vfio_pci_cleanup(void)
>  {
>   pci_unregister_driver(&vfio_pci_driver);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index d320411..007ca55 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
>  /*
>   * External user API
>   */
> +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
> +   bool msix);
>  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
>  extern void vfio_group_put_external_user(struct vfio_group *group);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Eric Auger

On 11/20/2014 05:05 PM, Alex Williamson wrote:
> On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
>> This API returns the host irq for the MSI/MSI-X interrrupts.
>>
>> Signed-off-by: Feng Wu 
>> ---
>>  drivers/vfio/pci/vfio_pci.c |   10 ++
>>  include/linux/vfio.h|2 ++
>>  2 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 9558da3..4fb9828 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -1009,6 +1009,16 @@ put_devs:
>>  kfree(devs.devices);
>>  }
>>  
>> +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
>> msix)
>> +{
>> +struct vfio_pci_device *vdev =
>> +(struct vfio_pci_device *)vfio_device_data(device);
>> +struct pci_dev *pdev = vdev->pdev;
>> +
>> +return msix ? vdev->msix[vector].vector : pdev->irq + vector;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
>> +
> 
> It's difficult to see what you're going to do with this, but it's making
> the assumption that the vfio_device is a vfio_pci_device, which is the
> wrong starting point.  Eric proposed[1] some VFIO external user
> interface extensions, that have since been refined in discussion that
> allows the external user (KVM in this case) access to the struct device
> where it can get the IRQ vectors itself.  We either need to follow that
> same path for posted interrupts, teaching KVM about struct pci_device in
> order to extract the IRQ info, or we need to create another vfio
> external user interface extension that can return IRQ info regardless of
> the device type.  Thanks,
> 
> Alex
> 
> 
> [1] https://lkml.org/lkml/2014/8/25/257

Here is the v2: https://lkml.org/lkml/2014/9/1/347

Best Regards

Eric
> http://lwn.net/Articles/610087/
>>  static void __exit vfio_pci_cleanup(void)
>>  {
>>  pci_unregister_driver(&vfio_pci_driver);
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index d320411..007ca55 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
>>  /*
>>   * External user API
>>   */
>> +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
>> +  bool msix);
>>  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
>>  extern void vfio_group_put_external_user(struct vfio_group *group);
>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM causes #GP on XRSTORS

2014-11-20 Thread Nadav Amit

Fenghua,

I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to 
get a #GP when it is trying to load the guest FPU.
One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I 
am not sure it is the only problem.
Was KVM ever tested with XRSTORS?

Thanks,
Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Eric S. Johansson



On 11/18/2014 9:57 AM, Eric S. Johansson wrote:


That's great to know. I will spin up a version of Windows 7 and give 
it a try given that I'm not looking at it, I can strip it down to the 
barest user interface elements and improve performance significantly.


I tried it and it took me approximately 10 to 12 hours to install 
Windows 7 twice and I didn't even finish installing the last time.


Here's what happened. The first time I installed it, it was a naïve 
install. Took all the defaults just set up the ISO and let the install 
run. Then I installed all the updates. Hours went by and it kind of came 
up and ran but then I tried to install the virt I/O drivers and the 
Windows installation lost its mind. Did some reading on how to make 
performance better and on using the virtio drivers in windows.


So I start of the second install, same size disk 25 GB, same amount of 
RAM, 1 GB and installed the ethernet, disk and balloon drivers at the 
right time. I also changed the cache to none, I/O something to native 
and I think that's about it. Anyway, that was not really any 
improvement. It's still was incredibly slow and this time it was 
complaining about running out of memory and packages install never 
finished. Just kept going and going going.  iptraf reported network io 
ranging from 3kbit to 100kbit range when the updates were running.


I'm accustomed to lesser performance on virtual machines. That's the 
hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core 
duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I 
doing wrong? It would be nice to use a slow machine like this as many 
handcrips don't have a whole lot of resources for buying newer/faster 
machines. On the other hand, many of them use desktops and work from one 
place whereas someone like me is all over the map (quite literally).


--- eric
[1] Part of the reason I don't bother upgrading machines all that often 
is because it no matter how fast the CPU runs or how much memory I have, 
Windows always runs about the same speed.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock

2014-11-20 Thread Alexander Graf



On 05.11.14 02:21, Paul Mackerras wrote:
> From: "Aneesh Kumar K.V" 
> 
> In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
> next HPTE without unlocking the invalid one.  In fact we should never
> find an invalid and unlocked HPTE in the rmap chain, but for robustness
> we should unlock it.  This adds the missing unlock.
> 
> Reported-by: Benjamin Herrenschmidt 
> Signed-off-by: Aneesh Kumar K.V 
> Signed-off-by: Paul Mackerras 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Alexander Graf



On 03.11.14 16:35, Cédric Le Goater wrote:
> When being restored from qemu, the kvm_get_htab_header are in native
> endian, but the ptes are big endian. 
> 
> This patch fixes restore on a KVM LE host. Qemu also needs a fix for
> this :
> 
>  http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html
> 
> Signed-off-by: Cédric Le Goater 
> Cc: Paul Mackerras 
> Cc: Alexey Kardashevskiy 
> Cc: Gregory Kurz 
> 
> ---
> 
>  Tested on 3.17-rc7 with LE and BE host.
> 
>  
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
> ===
> --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil
>   err = -EFAULT;
>   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
>   goto out;
> + v = be64_to_cpu(v);
> + r = be64_to_cpu(r);

This will trigger warnings with sparse. Please introduce new be64
variables that you do get_user on and that you then use as source for v
and r.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli

Hi,

On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote:
> On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote:
> > Agreed, but for doing live memory snapshot (VM is running when do 
> > snapsphot),
> > we have to do this (block the write action), because we have to save the 
> > page before it
> > is dirtied by writing action. This is the difference, compared to pre-copy 
> > migration.
> 
> Ah ha, I understand the difference now. I suppose that you have considered
> doing a traditional pre-copy migration (that is, passes over memory saving
> dirty pages, followed by a pause and a final dump of remaining dirty pages) to
> a file. Your approach has the advantage of having the VM pause time bounded by
> the time it takes to handle the userfault and do the write, as opposed to
> pre-copy migration which has a pause time bounded by the time it takes to do
> the final dump of dirty pages, which, in the worst case, is the time it takes
> to dump all of the guest memory!

It sounds really similar issue as live migration, one can implement a
precopy live snapshot, or a precopy+postcopy live snapshot or a pure
postcopy live snapshot.

The decision on the amount of precopy done before engaging postcopy
(zero passes, 1 pass, or more passes) would have similar tradeoffs
too, except instead of having to re-transmit the re-dirtied pages over
the wire, it would need to overwrite them to disk.

The more precopy passes, the longer it takes for the live snapshotting
process to finish and the more I/O there will be (for live migration it'd
be network bandwidth usage instead of amount of I/O), but the shortest
the postcopy runtime will be (and the shorter postcopy runtime is, the
fewer userfaults will end up triggering on writes, in turn reducing
the slowdown and the artificial fault latency introduced to the guest
runtime). But the more precopy passes the more overwriting will happen
during the "longer" precopy stage and the more overall load there will
be for the host (the otherwise idle part of the host).

For the postcopy live snapshot the wrprotect faults are quite
equivalent to the not-present faults of postcopy live migration logic.

> You could use the old fork & dump trick. Given that the guest's memory is
> backed by private VMA (as of a year ago when I last looked, is always the case
> for QEMU), you can have the kernel do the write protection for you.
> Essentially, you fork Qemu and, in the child process, dump the guest memory
> then exit. If the parent (including the guest) writes to guest memory, then it
> will fault and the kernel will copy the page. 
> 
> The fork & dump approach will give you the best performance w.r.t. guest pause
> times (i.e., just pausing for the COW fault handler), but it does have the
> distinct disadvantage of potentially using 2x the guest memory (i.e., if the
> parent process races ahead and writes to all of the pages before you finish 
> the
> dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child
> memory as you copy it.

This is a very good point. fork must be evaluated first because it
literally already provides you a readonly memory snapshot of the guest
memory.

The memory cons mentioned above could lead to both -ENOMEM of too many
guests runs live snapshots at the same time in the same host, unless
overcommit_memory is set to 1 (0 by default). Even then if too many
live snapshots are running in parallel you could hit the OOM killer if
there are just a bit too many faults at the same time, or you could
hit heavy swapping which isn't ideal either.

In fact the -ENOMEM avoidance (with qemu failing) is one of the two
critical reasons why qemu always set the guest memory as
MADV_DONTFORK. But that's not the only reason.

To use the fork() trick you'd need to undo the MADV_DONTFORK first but
that would open another problem: there's a race condition between
fork() O_DIRECT and <4k hardblocksize of virtio-blk. If there's any
read() syscall with O_DIRECT with len=512 while fork() is running
(think if the aio runs in parallel with the live snapshot thread that
forks the child to dump the snapshot) and if the guest writes with the
CPU to any 512 fragment of the same page that is the destination
buffer of the write(len=512) (on two different 512bytes area of the
same guest page) the O_DIRECT write will get lost.

So to use fork we'd need to fix this longstanding race (I tried but in
the end we declared it an userland issue because it's not exploitable
to bypass permissions or corrupt kernel or unrelated memory). Or you'd
need to add locking between the dataplane/aio threads and the live
snapshot thread to ensure no direct-io I/O is ever in-flight while
fork runs.

The O_DIRECT however would only help if it's qemu TCG, if it's KVM
it's not even enough to stop O_DIRECT reads. KVM would use
gup(write=1) from the async-pf all the time... and then the shadow
pagetables would go out of sync (it won't destabilize the host of
course, but the guest memor

Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI

2014-11-20 Thread Alexander Graf



On 03.11.14 05:51, Paul Mackerras wrote:
> From: Mahesh Salgaonkar 
> 
> When we get an HMI (hypervisor maintenance interrupt) while in a
> guest, we see that guest enters into paused state.  The reason is, in
> kvmppc_handle_exit_hv it falls through default path and returns to
> host instead of resuming guest.  This causes guest to enter into
> paused state.  HMI is a hypervisor only interrupt and it is safe to
> resume the guest since the host has handled it already.  This patch
> adds a switch case to resume the guest.
> 
> Without this patch we see guest entering into paused state with following
> console messages:
> 
> [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
> [ 3003.329356]  Error detail: Timer facility experienced an error
> [ 3003.329359]HMER: 0840
> [ 3003.329360]TFMR: 4a12000980a84000
> [ 3003.329366] vcpu c007c35094c0 (40):
> [ 3003.329368] pc  = c00c2ba0  msr = 80009032  trap = e60
> [ 3003.329370] r 0 = c021ddc0  r16 = 0046
> [ 3003.329372] r 1 = c0007a02bbd0  r17 = 327d5d98
> [ 3003.329375] r 2 = c10980b8  r18 = 1fc9a0b0
> [ 3003.329377] r 3 = c142d6b8  r19 = c142d6b8
> [ 3003.329379] r 4 = 0002  r20 = 
> [ 3003.329381] r 5 = c524a110  r21 = 
> [ 3003.329383] r 6 = 0001  r22 = 
> [ 3003.329386] r 7 =   r23 = c524a110
> [ 3003.329388] r 8 =   r24 = 0001
> [ 3003.329391] r 9 = 0001  r25 = c0007c31da38
> [ 3003.329393] r10 = c14280b8  r26 = 0002
> [ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c524a110
> [ 3003.329397] r12 = 28004484  r28 = c0007c31da38
> [ 3003.329399] r13 = cfe01400  r29 = 0002
> [ 3003.329401] r14 = 0046  r30 = c3011e00
> [ 3003.329403] r15 = ffba  r31 = 0002
> [ 3003.329404] ctr = c041a670  lr  = c0272520
> [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002
> [ 3003.329406] sprg0 =  sprg1 = cfe01400
> [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005
> [ 3003.329408] cr = 48004482  xer = 2000  dsisr = 4200
> [ 3003.329409] dar = 010015020048
> [ 3003.329410] fault dar = 010015020048 dsisr = 4200
> [ 3003.329411] SLB (8 entries):
> [ 3003.329412]   ESID = c800 VSID = 40016e7779000510
> [ 3003.329413]   ESID = d801 VSID = 400142add1000510
> [ 3003.329414]   ESID = f804 VSID = 4000eb1a81000510
> [ 3003.329415]   ESID = 1f00080b VSID = 40004fda0a000d90
> [ 3003.329416]   ESID = 3f00080c VSID = 400039f536000d90
> [ 3003.329417]   ESID = 180d VSID = 0001251b35150d90
> [ 3003.329417]   ESID = 0100080e VSID = 4001e4609d90
> [ 3003.329418]   ESID = d8000819 VSID = 40013d349c000400
> [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = 
> 
> [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032
> [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
> [ 3003.329526]  Error detail: Timer facility experienced an error
> [ 3003.329527]HMER: 0840
> [ 3003.329527]TFMR: 4a12000980a94000
> [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
> [ 3006.359792]  Error detail: Timer facility experienced an error
> [ 3006.359795]HMER: 0840
> [ 3006.359797]TFMR: 4a12000980a84000
> 
>  IdName   State
> 
>  2 guest2 running
>  3 guest3 paused
>  4 guest4 running
> 
> Signed-off-by: Mahesh Salgaonkar 
> Signed-off-by: Paul Mackerras 

Do we need this for PR running on bare metal as well?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Exposing host debug capabilities to userspace

2014-11-20 Thread Alex Bennée

Hi,

I've almost finished the ARMv8 guest debug support but I have one
problem left to solve. userspace needs to know how many hardware debug
registers are available for GDB to use. This information is available
from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to
fetch it's value however semantically this is poor as it's API is for
getting guest state not host state and they could theoretically have
different values.

So far the options I've examined are:

* KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1)

As explained above, abusing a guest state API for host configuration.

* ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH)

This is used by GDB to access the host details in debug-monitors.
However the ptrace API really wants you to attach to a process before
calling PTRACE_GETREGSET. Currently I've tried attaching to the
thread_id of the vCPU but this fails with EPERM, I suspect because
attaching to your own threads likely upsets the kernel.

* KVM ioctl KVM_GET_DEBUGREGS

This is currently x86 only and looks like it's more aimed at debug
registers than capability stuff. Also I'm not sure what the state of
this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or
is one an older deprecated x86 only API?

* Export the information via sysfs

I suppose the correct canonical non-subsystem specific way to make this
information available it to expose the data in some sort of sysfs node?
However I don't see any existing sysfs structure for the CPU.

* Expand /proc/cpuinfo

I suspect adding extra text to be badly parsed by userspace is just
horrid and unacceptable behaviour ;-)

* Add another KVM ioctl?

This would have the downside of being specific to KVM and of course
proliferating the API space again.

I'm open to any suggestions and look forward to your valued feedback ;-)

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

2014-11-20 Thread Alexander Graf



On 03.11.14 05:52, Paul Mackerras wrote:
> From: "Suresh E. Warrier" 
> 
> The kvmppc_vcore_blocked() code does not check for the wait condition
> after putting the process on the wait queue. This means that it is
> possible for an external interrupt to become pending, but the vcpu to
> remain asleep until the next decrementer interrupt.  The fix is to
> make one last check for pending exceptions and ceded state before
> calling schedule().
> 
> Signed-off-by: Suresh Warrier 
> Signed-off-by: Paul Mackerras 

I don't understand the race you're fixing here. Can you please explain it?


Alex

> ---
>  arch/powerpc/kvm/book3s_hv.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index cd7e030..1a7a281 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu 
> *vcpu, int wait_state)
>   */
>  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  {
> + struct kvm_vcpu *vcpu;
> + int do_sleep = 1;
> +
>   DEFINE_WAIT(wait);
>  
>   prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> +
> + /*
> +  * Check one last time for pending exceptions and ceded state after
> +  * we put ourselves on the wait queue
> +  */
> + list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
> + if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) {
> + do_sleep = 0;
> + break;
> + }
> + }
> +
> + if (!do_sleep) {
> + finish_wait(&vc->wq, &wait);
> + return;
> + }
> +
>   vc->vcore_state = VCORE_SLEEPING;
>   spin_unlock(&vc->lock);
>   schedule();
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Some fixes for HV KVM on PPC

2014-11-20 Thread Alexander Graf



On 03.11.14 05:51, Paul Mackerras wrote:
> Here are fixes for five bugs which were found in the testing of our
> PowerKVM product.  The bugs range from guest performance issues to
> guest crashes and memory corruption.  Please apply.

Thanks, applied patches 1-4 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli

Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:
> Yes, you are right. This is what i really want, bypass all non-present faults
> and only track strict wrprotect faults. ;)
> 
> So, do you plan to support that in the userfault API?

Yes I think it's good idea to support wrprotect/COW faults too.

I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.

All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.

I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Mario Smarduch

On 11/20/2014 12:08 AM, Laszlo Ersek wrote:
> On 11/20/14 00:32, Mario Smarduch wrote:
>> Hi Laszlo,
>>
>> couple observations.
>>
>>  I'm wondering if access from qemu and guest won't
>> result in mixed memory attributes and if that's acceptable
>> to the CPU.
> 
> Normally this would be a problem I think (Jon raised the topic of live
> migration). However, for flash programming specifically, I think the
> guest's access pattern ensures that we'll see things OK.
> 
> When the guest issues the first write access, the memslot is deleted,
> and everything is forwarded to qemu, both reads and writes. In response
> qemu modifies the array that *otherwise* backs the flash. These
> modifications by qemu end up in the dcache mostly. When the guest is
> done "programming", it writes a special command (read array mode) at
> which point the memslot is recreated (as read-only) and flushed / set up
> for flushing during demand paging.
> 
> So from the emulated flash POV, the memslot either doesn't exist at all
> (and then qemu serves all accesses just fine), or it exists r/o, at
> which point qemu (host userspace) will have stopped writing to it, and
> will have set it up for flushing before and during guest read accesses.

I think beyond consistency, there should be no double mappings with
conflicting attributes at any time or CPU state is undefined. At least
that's what I recall for cases where identity mapping was cacheble and user
mmapp'ed regions uncacheable. Side effects like CPU hardstop or
victim invalidate of dirty cache line. With virtualization
extensions maybe behavior is different. I guess if you're not seeing
lock ups or crashes then it appears to work :) Probably more senior
folks in ARM community are in better position to address this,
but I thought I raise a flag.

> 
>> Also is if you update memory from qemu you may break
>> dirty page logging/migration.
> 
> Very probably. Jon said the same thing.
> 
>> Unless there is some other way
>> you keep track. Of course it may not be applicable in your
>> case (i.e. flash unused after boot).
> 
> The flash *is* used after boot, because the UEFI runtime variable
> services *are* exercised by the guest kernel. However those use the same
> access pattern (it's the same set of UEFI services just called by a
> different "client").
> 
> *Uncoordinated* access from guest and host in parallel will be a big
> problem; but we're not that far yet, and we need to get the flash
> problem sorted, so that we can at least boot and work on the basic
> stuff. The flash programming dance happens to provide coordination; the
> flash mode changes (which are equivalent to the teardown and the
> recreation of the memslot) can be considered barriers.
> 
> I hope this is acceptable for the time being...

Yeah I understand you have a more imediatte requirement to support,
migration
isssue is more fyi. Thanks for the details helps to understand the context.

- Mario
> 
> Thanks
> Laszlo
> 
>>
>> - Mario
>>
>> On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
>>> On 11/17/14 16:29, Paolo Bonzini wrote:


 On 17/11/2014 15:58, Ard Biesheuvel wrote:
> Readonly memslots are often used to implement emulation of ROMs and
> NOR flashes, in which case the guest may legally map these regions as
> uncached.
> To deal with the incoherency associated with uncached guest mappings,
> treat all readonly memslots as incoherent, and ensure that pages that
> belong to regions tagged as such are flushed to DRAM before being passed
> to the guest.

 On x86, the processor combines the cacheability values from the two
 levels of page tables.  Is there no way to do the same on ARM?
>>>
>>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
>>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
>>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
>>> host) memory attributes.
>>>
>>> When qemu writes, as part of emulating the flash programming commands,
>>> to the RAMBlock that *otherwise* backs the flash range (as a r/o
>>> memslot), those writes (from host userspace) tend to end up in dcache.
>>>
>>> But, when the guest flips back the flash to romd mode, and tries to read
>>> back the values from the flash as plain ROM, the dcache is completely
>>> bypassed due to the strict stage1 mapping, and the guest goes directly
>>> to DRAM.
>>>
>>> Where qemu's earlier writes are not yet / necessarily visible.
>>>
>>> Please see my original patch (which was incomplete) in the attachment,
>>> it has a very verbose commit message.
>>>
>>> Anyway, I'll let others explain; they can word it better than I can :)
>>>
>>> FWIW,
>>>
>>> Series
>>> Reviewed-by: Laszlo Ersek 
>>>
>>> I ported this series to a 3.17.0+ based kernel, and tested it. It works
>>> fine. The ROM-like view of the NOR flash now reflects the previously
>>> programmed contents.
>>>
>>> Series
>>> Tested-by: Laszlo Ersek 
>>>
>>> Thanks!
>>> L

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell

On 20 November 2014 18:35, Mario Smarduch  wrote:
> I think beyond consistency, there should be no double mappings with
> conflicting attributes at any time or CPU state is undefined.

The situation is not so bleak as this. See section B2.9 "Mismatched
memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
out in some detail what the results of mismatched attributes are
(generally, you lose ordering or coherency guarantees you might
have hoped to have). They're not pretty, but it's not as bad
as completely UNPREDICTABLE behaviour.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Mario Smarduch

On 11/20/2014 10:40 AM, Peter Maydell wrote:
> On 20 November 2014 18:35, Mario Smarduch  wrote:
>> I think beyond consistency, there should be no double mappings with
>> conflicting attributes at any time or CPU state is undefined.
> 
> The situation is not so bleak as this. See section B2.9 "Mismatched
> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
> out in some detail what the results of mismatched attributes are
> (generally, you lose ordering or coherency guarantees you might
> have hoped to have). They're not pretty, but it's not as bad
> as completely UNPREDICTABLE behaviour.
> 
> thanks
> -- PMM
> 
Hi Peter,
  thanks for digging that up, quite a list to navigate
but it does provide detailed guidance.

- Mario
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

2014-11-20 Thread Suresh E. Warrier



On 11/20/2014 11:36 AM, Alexander Graf wrote:
> 
> 
> On 03.11.14 05:52, Paul Mackerras wrote:
>> From: "Suresh E. Warrier" 
>>
>> The kvmppc_vcore_blocked() code does not check for the wait condition
>> after putting the process on the wait queue. This means that it is
>> possible for an external interrupt to become pending, but the vcpu to
>> remain asleep until the next decrementer interrupt.  The fix is to
>> make one last check for pending exceptions and ceded state before
>> calling schedule().
>>
>> Signed-off-by: Suresh Warrier 
>> Signed-off-by: Paul Mackerras 
> 
> I don't understand the race you're fixing here. Can you please explain it?
> 

When a virtual interrupt needs to be delivered to the guest, and the
virtual ICS state for the interrupt and virtual ICP state for the VCPU
allow for the VCPU to be immediately interrupted, we
1. Set the BOOK3S_INTERRUPT_EXTERNAL_LEVEL bit in pending_exceptions.
2. Call kvmppc_fast_vcpu_kick_hv(), which checks the wait queue at vcpu->wq
   to wake the VCPU up.

The caller of kvmppc_vcore_blocked() does the check for pending exceptions, but
there is a race condition here and we do need to check again after the VCPU
is put on the wait queue.

-suresh

> 
> Alex
> 
>> ---
>>  arch/powerpc/kvm/book3s_hv.c | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index cd7e030..1a7a281 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu 
>> *vcpu, int wait_state)
>>   */
>>  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>>  {
>> +struct kvm_vcpu *vcpu;
>> +int do_sleep = 1;
>> +
>>  DEFINE_WAIT(wait);
>>  
>>  prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
>> +
>> +/*
>> + * Check one last time for pending exceptions and ceded state after
>> + * we put ourselves on the wait queue
>> + */
>> +list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
>> +if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) {
>> +do_sleep = 0;
>> +break;
>> +}
>> +}
>> +
>> +if (!do_sleep) {
>> +finish_wait(&vc->wq, &wait);
>> +return;
>> +}
>> +
>>  vc->vcore_state = VCORE_SLEEPING;
>>  spin_unlock(&vc->lock);
>>  schedule();
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Jon Masters

On 11/20/2014 01:40 PM, Peter Maydell wrote:
> On 20 November 2014 18:35, Mario Smarduch  wrote:
>> I think beyond consistency, there should be no double mappings with
>> conflicting attributes at any time or CPU state is undefined.
> 
> The situation is not so bleak as this. See section B2.9 "Mismatched
> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
> out in some detail what the results of mismatched attributes are
> (generally, you lose ordering or coherency guarantees you might
> have hoped to have). They're not pretty, but it's not as bad
> as completely UNPREDICTABLE behaviour.

Quick side note that I did raise exactly this issue with the ARM
Architecture team several years ago (that of missmatched memory
attributes between a guest and hypervisor) and it is a known concern.
I'm personally concerned about a couple of things that I won't go into
here but will followup on what the longer term plan might be.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell

On 20 November 2014 19:49, Jon Masters  wrote:
> On 11/20/2014 01:40 PM, Peter Maydell wrote:
>> The situation is not so bleak as this. See section B2.9 "Mismatched
>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
>> out in some detail what the results of mismatched attributes are
>> (generally, you lose ordering or coherency guarantees you might
>> have hoped to have). They're not pretty, but it's not as bad
>> as completely UNPREDICTABLE behaviour.
>
> Quick side note that I did raise exactly this issue with the ARM
> Architecture team several years ago (that of missmatched memory
> attributes between a guest and hypervisor) and it is a known concern.

I think in practice for a well-behaved guest we can arrange
that everything is fine (roughly, the guest has to treat
DMA-capable devices as doing coherent-dma, which we can tell
them to do via DT bindings or ACPI[*], plus the special
case handling we already have for bootup), and naughty guests
will only confuse themselves. But I need to think a bit more
about it (and we should probably write down how it works
somewhere :-)).

[*] We should be able to emulate non-coherent-DMA devices but
would need an extra API from KVM so userspace can do "clean
dcache to point of coherency". And in practice I'm not sure
we need to emulate those devices...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-20 Thread Linus Torvalds

On Thu, Nov 20, 2014 at 3:39 AM, Christian Borntraeger
 wrote:
>
> So It looks like we could make a change to ACCESS_ONCE. Would something like
>
> CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

No, if it's just a handful of places to be fixed, let's not add config
options for broken cases.

> This would boil down to
> Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set 
> + docu update + comments
> Patch2: Change mm/* to barriers
> Patch3: Change x86 locks
> Patch4: Change x86 gup
> Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Just do patches 2-4 first, and then patch 1 unconditionally.

Obviously you'd need to spread the word on linux-arch to see how bad
it is for other cases, but if other architectures are at all like x86
and s390, and just require a few trivial patches, let's not make this
some config option.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Laszlo Ersek

On 11/20/14 21:10, Peter Maydell wrote:
> On 20 November 2014 19:49, Jon Masters  wrote:
>> On 11/20/2014 01:40 PM, Peter Maydell wrote:
>>> The situation is not so bleak as this. See section B2.9 "Mismatched
>>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
>>> out in some detail what the results of mismatched attributes are
>>> (generally, you lose ordering or coherency guarantees you might
>>> have hoped to have). They're not pretty, but it's not as bad
>>> as completely UNPREDICTABLE behaviour.
>>
>> Quick side note that I did raise exactly this issue with the ARM
>> Architecture team several years ago (that of missmatched memory
>> attributes between a guest and hypervisor) and it is a known concern.
> 
> I think in practice for a well-behaved guest we can arrange
> that everything is fine (roughly, the guest has to treat
> DMA-capable devices as doing coherent-dma, which we can tell
> them to do via DT bindings or ACPI[*], plus the special
> case handling we already have for bootup), and naughty guests
> will only confuse themselves. But I need to think a bit more
> about it (and we should probably write down how it works
> somewhere :-)).
> 
> [*] We should be able to emulate non-coherent-DMA devices but
> would need an extra API from KVM so userspace can do "clean
> dcache to point of coherency". And in practice I'm not sure
> we need to emulate those devices...

This basically means that virtio transfers should just use normal memory
in the guest (treating virtio transfers as "coherent DMA"), right?

We're actually doing that in the ArmVirtualizationQemu.dsc build of
edk2, and Things Work Great (TM) in guests on the Mustang.

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Paolo Bonzini



On 20/11/2014 17:28, Eric S. Johansson wrote:
> I'm accustomed to lesser performance on virtual machines. That's the
> hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core
> duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I
> doing wrong? It would be nice to use a slow machine like this as many
> handcrips don't have a whole lot of resources for buying newer/faster
> machines. On the other hand, many of them use desktops and work from one
> place whereas someone like me is all over the map (quite literally).

How did you start the virtual machine?  Perhaps you're not using KVM but
emulation?  I have a fast machine but slow disk (a NAS on 100 MBit
ethernet) and I can do about 15 automated installations in less than 6
hours.

Are you using libvirt or directly invoking QEMU?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell

On 20 November 2014 21:13, Laszlo Ersek  wrote:
> On 11/20/14 21:10, Peter Maydell wrote:
>> I think in practice for a well-behaved guest we can arrange
>> that everything is fine (roughly, the guest has to treat
>> DMA-capable devices as doing coherent-dma, which we can tell
>> them to do via DT bindings or ACPI[*], plus the special
>> case handling we already have for bootup), and naughty guests
>> will only confuse themselves. But I need to think a bit more
>> about it (and we should probably write down how it works
>> somewhere :-)).

> This basically means that virtio transfers should just use normal memory
> in the guest (treating virtio transfers as "coherent DMA"), right?

Normal *cacheable*, but yes.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: KVM causes #GP on XRSTORS

2014-11-20 Thread Yu, Fenghua

> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> Sent: Thursday, November 20, 2014 8:34 AM
> To: Paolo Bonzini; Yu, Fenghua
> Cc: kvm list
> Subject: KVM causes #GP on XRSTORS
> 
> Fenghua,
> 
> I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears
> to get a #GP when it is trying to load the guest FPU.
> One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu,
> but I am not sure it is the only problem.
> Was KVM ever tested with XRSTORS?

In dmesg, do you see "xsave: . using compacted form"?

If you use kernel "noxsaves", does KVM work fine?

Thanks.

-Fenghua
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Eric S. Johansson



On 11/20/2014 4:48 PM, Paolo Bonzini wrote:


On 20/11/2014 17:28, Eric S. Johansson wrote:

I'm accustomed to lesser performance on virtual machines. That's the
hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core
duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I
doing wrong? It would be nice to use a slow machine like this as many
handcrips don't have a whole lot of resources for buying newer/faster
machines. On the other hand, many of them use desktops and work from one
place whereas someone like me is all over the map (quite literally).

How did you start the virtual machine?  Perhaps you're not using KVM but
emulation?  I have a fast machine but slow disk (a NAS on 100 MBit
ethernet) and I can do about 15 automated installations in less than 6
hours.

Are you using libvirt or directly invoking QEMU?


I was using one of the GUIs ( less hand stress than trying to assemble a 
commandline). Unfortunately I'm in Windows 8 right now because I'm 
writing. I'm very sure the GUI was http://virt-manager.org/  I tried a 
different one but it kept telling me I only had QEMU I thought "silly 
program, that can't be right". Someday I will not argue with software or 
small electronic boxes. They don't care who wins and they are much more 
stubborn than I am.


I'll be able to run some tests in about 2 to 3 hours after I finish this 
document. Let me know what I should look at?  on a side note, a pointer 
to an automated install process would be wonderful.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Cédric Le Goater

When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian. 

This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :

 http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html

Signed-off-by: Cédric Le Goater 
Cc: Paul Mackerras 
Cc: Alexey Kardashevskiy 
Cc: Gregory Kurz 

---

Tested on 3.18-rc5 with LE and BE host.

v2: add be64 local variables to be friendly with sparse
 

 arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
===
--- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil
hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
lbuf = (unsigned long __user *)buf;
for (j = 0; j < hdr.n_valid; ++j) {
+   __be64 hpte_v;
+   __be64 hpte_r;
+
err = -EFAULT;
-   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
+   if (__get_user(hpte_v, lbuf) ||
+   __get_user(hpte_r, lbuf + 1))
goto out;
+   v = be64_to_cpu(hpte_v);
+   r = be64_to_cpu(hpte_r);
err = -EINVAL;
if (!(v & HPTE_V_VALID))
goto out;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM causes #GP on XRSTORS

2014-11-20 Thread Wanpeng Li

Hi Nadav,
On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote:
>Fenghua,
>
>I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to 
>get a #GP when it is trying to load the guest FPU.
>One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I 
>am not sure it is the only problem.
>Was KVM ever tested with XRSTORS?

Current kvm and qemu use standard format and xsaves/xrstors should use
compact format, in addition, vmx is still not enabled for xsaves/xrstors 
in kvm.

Regards,
Wanpeng Li 

>
>Thanks,
>Nadav
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM causes #GP on XRSTORS

2014-11-20 Thread Nadav Amit


> On Nov 21, 2014, at 02:10, Wanpeng Li  wrote:
> 
> Hi Nadav,
> On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote:
>> Fenghua,
>> 
>> I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears 
>> to get a #GP when it is trying to load the guest FPU.
>> One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but 
>> I am not sure it is the only problem.
>> Was KVM ever tested with XRSTORS?
> 
> Current kvm and qemu use standard format and xsaves/xrstors should use
> compact format, in addition, vmx is still not enabled for xsaves/xrstors 
> in kvm.

Sorry, but I don’t quite understand. Should KVM work on a machine that supports 
xsaves/xrstors?
I am not referring to whether KVM exposes the capability to the guest, but to 
whether KVM should work at all.

Thanks,
Nadav


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group

2014-11-20 Thread leizhen

On 2014/11/20 23:37, Alex Williamson wrote:
> On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote:
>> The next code fragment "list_for_each_entry" is not depend on "minor". With 
>> this
>> patch, the free of "minor" in "list_for_each_entry" can be reduced, and 
>> there is
>> no functional change.
> 
> A reasonable micro-optimization, but I'm curious if you're actually
> seeing some measurable overhead from this.  It seems like we'd need to

Oh, I just found this by code review. It's good to put pure check first and
resource allocation later, if they are order independent. Isn't it?
I think it will not improve performance but only save a little code space
and looks feel better.

> have multiple devices, all within the same IOMMU group, all probed by
> vfio-pci at the same time to exercise the race condition.  Thanks,

I think vfio_create_group can not be invoked frequently. So, the performance of 
this function
is not a major consideration, we don't care slightly performance change.

> 
> Alex
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/vfio/vfio.c | 13 ++---
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index f018d8d..737eb468 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
>> iommu_group *iommu_group)
>>
>>  mutex_lock(&vfio.group_lock);
>>
>> -minor = vfio_alloc_group_minor(group);
>> -if (minor < 0) {
>> -vfio_group_unlock_and_free(group);
>> -return ERR_PTR(minor);
>> -}
>> -
>>  /* Did we race creating this group? */
>>  list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
>>  if (tmp->iommu_group == iommu_group) {
>>  vfio_group_get(tmp);
>> -vfio_free_group_minor(minor);
>>  vfio_group_unlock_and_free(group);
>>  return tmp;
>>  }
>>  }
>>
>> +minor = vfio_alloc_group_minor(group);
>> +if (minor < 0) {
>> +vfio_group_unlock_and_free(group);
>> +return ERR_PTR(minor);
>> +}
>> +
>>  dev = device_create(vfio.class, NULL,
>>  MKDEV(MAJOR(vfio.group_devt), minor),
>>  group, "%d", iommu_group_id(iommu_group));
>> --
>> 1.8.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock

2014-11-20 Thread Wanpeng Li

Hi Tang,
On Fri, Nov 21, 2014 at 02:41:57PM +0800, Tang Chen wrote:
>Hi Wanpeng,
>
>Sorry, it is about this problem again.
>
>I booted 3.18.0-rc2, without Paolo's patch. lockdep and RCU debug
>were all opened.
>
>Then I started a qemu vm with the following options:
>
>/usr/libexec/qemu-kvm -hda rhel7.0ga-x64.qcow2 -m 512M -cpu
>host,-x2apic -serial stdio
>
>I added printk() in kvm_vcpu_reload_apic_access_page(), and it was
>printed out.
>So I think I can confirm that I have run into
>kvm_vcpu_reload_apic_access_page().
>
>But I still didn't see any warning.
>
>Is there anything else I should do ?
>Would you please share your qemu command with me ?

I test it on the other guy's Ivytown and take advantage of the qemu command 
line which he used, so I forget the accurate command line which used that day.

Paolo also reproduce the bug, Paolo, ping.

Regards,
Wanpeng Li 

>
>Thanks. :)
>
>On 11/14/2014 07:39 AM, Wanpeng Li wrote:
>>Hi Tang,
>>On Tue, Nov 11, 2014 at 01:35:29PM +0800, Tang Chen wrote:
>>>Hi Wanpeng,
>>>
>>Sorry for the late.
>>
>>>I think I have totally missed this thread.
>>>I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't
>>>get the warning.
>>I also opened lockdep and RCU debug, and tried 3.18.0-rc2 on a Ivy
>>bridge, the warning will be triggered after run qemu immediately. There
>>is no need to try any hotplug related stuff.
>>
>>In addition, Paolo's patch is merged upstream to fix this.
>>
>>commit a73896cb5bbdce672945745db8224352a689f580
>>Author: Paolo Bonzini 
>>Date:   Sun Nov 2 07:54:30 2014 +0100
>>
>>KVM: vmx: defer load of APIC access page address during reset
>>
>>Regards,
>>Wanpeng Li
>>
>>>My steps are:
>>>
>>>1. Use numactl to bind a qemu process to node1.
>>>2. Offline all node1 memory. And the qemu process is still running.
>>>
>>>Would you please tell me how did you reproduce it ?
>>>
>>>Thanks.
>>>
>>>On 11/02/2014 03:07 PM, Wanpeng Li wrote:
The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in
vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check()
usage warning. This patch fix it by holding srcu read lock in all
kvm_vcpu_reset() call path.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (&vcpu->mutex){+.+...}, at: [] vcpu_load+0x20/0xd0 
 [kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [] dump_stack+0x4e/0x71
  [] lockdep_rcu_suspicious+0xe7/0x120
  [] gfn_to_memslot+0xd5/0xe0 [kvm]
  [] __gfn_to_pfn+0x33/0x60 [kvm]
  [] gfn_to_page+0x25/0x90 [kvm]
  [] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [] ? __lock_is_held+0x54/0x80
  [] do_vfs_ioctl+0x300/0x520
  [] ? __fget+0x5/0x250
  [] ? __fget_light+0x2a/0xe0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai 
Reported-by: Alexei Starovoitov 
Suggested-by: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
v3 -> v4:
  * bypass the problem altoghter by kvm_make_request
v2 -> v3:
  * take care all vmx_vcpu_reset call path
v1 -> v2:
  * just fix hold the srcu read lock in vmx_vcpu_reset path

  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0f78db..3e556c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
-   kvm_vcpu_reload_apic_access_page(vcpu);
+   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
>>.
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread zhanghailiang


On 2014/11/21 1:38, Andrea Arcangeli wrote:

Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:

Yes, you are right. This is what i really want, bypass all non-present faults
and only track strict wrprotect faults. ;)

So, do you plan to support that in the userfault API?


Yes I think it's good idea to support wrprotect/COW faults too.



Great! Then i can expect your patches. ;)


I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.



Er, you have got the answer, no special, it's only for optimality.


All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.



It is really flexible in this way.


I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).



Thanks for your explanation, and your patience. It is really useful,
now i know more details about why 'fork & dump live snapshot' scenario
is not acceptable. Thanks :)


Thanks,
Andrea

.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

51 matches

Mail list logo