date:20141121

Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

2014-11-21 Thread Christoffer Dall

On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>  wrote:
> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> Hi All,
> >>
> >> I have second thoughts about rebasing KVM PMU patches
> >> to Marc's irq-forwarding patches.
> >>
> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> forwarded IRQs because they are shared between Host
> >> and Guest.
> >>
> >> Scenario1
> >> -
> >>
> >> We might have perf running on Host and no KVM guest
> >> running. In this scenario, we wont get interrupts on Host
> >> because the kvm_pmu_hyp_init() (similar to the function
> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> implementation) has put all host PMU IRQs in forwarding
> >> mode.
> >>
> >> The only way solve this problem is to not set forwarding
> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> have special routines to turn on and turn off the forwarding
> >> mode of PMU IRQs. These routines will be called from
> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> forwarding state.
> >>
> >> Scenario2
> >> -
> >>
> >> We might have perf running on Host and Guest simultaneously
> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> of Marc's patchset which is called before local_irq_enable()).
> >>
> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> will accidentally forward IRQ meant for Host to Guest unless
> >> we put additional checks to inspect VCPU PMU state.
> >>
> >> Am I missing any detail about IRQ forwarding for above
> >> scenarios?
> >>
> > Hi Anup,
> 
> Hi Christoffer,
> 
> >
> > I briefly discussed this with Marc.  What I don't understand is how it
> > would be possible to get an interrupt for the host while running the
> > guest?
> >
> > The rationale behind my question is that whenever you're running the
> > guest, the PMU should be programmed exclusively with guest state, and
> > since the PMU is per core, any interrupts should be for the guest, where
> > it would always be pending.
> 
> Yes, thats right PMU is programmed exclusively for guest when
> guest is running and for host when host is running.
> 
> Let us assume a situation (Scenario2 mentioned previously)
> where both host and guest are using PMU. When the guest is
> running we come back to host mode due to variety of reasons
> (stage2 fault, guest IO, regular host interrupt, host interrupt
> meant for guest, ) which means we will return from the
> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> At this point we would have restored back host PMU context and
> any PMU counter used by host can trigger PMU overflow interrup
> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> which will try to detect PMU irq forwarding state in GIC hence it
> can accidentally discover PMU irq pending for guest while this
> PMU irq is actually meant for host.
> 
> This above mentioned situation does not happen for timer
> because virtual timer interrupts are exclusively used for guest.
> The exclusive use of virtual timer interrupt for guest ensures that
> the function kvm_timer_sync_hwstate() will always see correct
> state of virtual timer IRQ from GIC.
> 
I'm not quite following.

When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
you would (1) capture the active state of the IRQ pertaining to the
guest and (2) deactive the IRQ on the host, then (3) switch the state of
the PMU to the host state, and finally (4) re-enable IRQs on the CPU
you're running on.

If the host PMU state restored in (3) causes the PMU to raise an
interrupt, you'll take an interrupt after (4), which is for the host,
and you'll handle it on the host.

Whenever you schedule the guest VCPU again, you'll (a) disable
interrupts on the CPU, (b) restore the active state of the IRQ for the
guest, (c) restore the guest PMU state, (d) switch to the guest with
IRQs enabled on the CPU (potentially).

If the state in (c) causes an IRQ it will not fire on the host, because
it is marked as active in (b).

Where does this break?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm/arm64: Enable Dirty Page logging for ARMv8 move log read, tlb flush to generic code

2014-11-21 Thread Christoffer Dall

On Wed, Nov 19, 2014 at 12:15:55PM -0800, Mario Smarduch wrote:
> On 11/19/2014 06:39 AM, Christoffer Dall wrote:
> > Hi Mario,
> > 
> > On Fri, Nov 07, 2014 at 12:51:39PM -0800, Mario Smarduch wrote:
> >> On 11/07/2014 12:20 PM, Christoffer Dall wrote:
> >>> On Thu, Oct 09, 2014 at 07:34:07PM -0700, Mario Smarduch wrote:
>  This patch enables ARMv8 dirty page logging and unifies ARMv7/ARMv8 code.
> 
>  Signed-off-by: Mario Smarduch 
>  ---
>   arch/arm/include/asm/kvm_host.h | 12 
>   arch/arm/kvm/arm.c  |  9 -
>   arch/arm/kvm/mmu.c  | 17 +++--
>   arch/arm64/kvm/Kconfig  |  2 +-
>   4 files changed, 12 insertions(+), 28 deletions(-)
> 
>  diff --git a/arch/arm/include/asm/kvm_host.h 
>  b/arch/arm/include/asm/kvm_host.h
>  index 12311a5..59565f5 100644
>  --- a/arch/arm/include/asm/kvm_host.h
>  +++ b/arch/arm/include/asm/kvm_host.h
>  @@ -220,18 +220,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
>  boot_pgd_ptr,
>   kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>   }
>   
>  -/**
>  - * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>  - * @kvm:   pointer to kvm structure.
>  - *
>  - * Interface to HYP function to flush all VM TLB entries without address
>  - * parameter.
>  - */
>  -static inline void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>  -{
>  -kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>  -}
>  -
>   static inline int kvm_arch_dev_ioctl_check_extension(long ext)
>   {
>   return 0;
>  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>  index 0546fa3..6a6fd6b 100644
>  --- a/arch/arm/kvm/arm.c
>  +++ b/arch/arm/kvm/arm.c
>  @@ -242,7 +242,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  const struct kvm_memory_slot *old,
>  enum kvm_mr_change change)
>   {
>  -#ifdef CONFIG_ARM
>   /*
>    * At this point memslot has been committed and there is an
>    * allocated dirty_bitmap[], dirty pages will be be tracked 
>  while the
>  @@ -250,7 +249,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>    */
>   if ((change != KVM_MR_DELETE) && (mem->flags & 
>  KVM_MEM_LOG_DIRTY_PAGES))
>   kvm_mmu_wp_memory_region(kvm, mem->slot);
>  -#endif
>   }
>   
>   void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  @@ -783,13 +781,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   }
>   }
>   
>  -#ifdef CONFIG_ARM64
>  -int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
>  kvm_dirty_log *log)
>  -{
>  -return -EINVAL;
>  -}
>  -#endif
>  -
>   static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>   struct kvm_arm_device_addr 
>  *dev_addr)
>   {
>  diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>  index df1a5a3..8c0f9f2 100644
>  --- a/arch/arm/kvm/mmu.c
>  +++ b/arch/arm/kvm/mmu.c
>  @@ -49,11 +49,18 @@ static phys_addr_t hyp_idmap_vector;
>   
>   static bool kvm_get_logging_state(struct kvm_memory_slot *memslot)
>   {
>  -#ifdef CONFIG_ARM
>   return !!memslot->dirty_bitmap;
>  -#else
>  -return false;
>  -#endif
>  +}
>  +
>  +/**
>  + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries for ARMv7/8
>  + * @kvm:pointer to kvm structure.
>  + *
>  + * Interface to HYP function to flush all VM TLB entries
>  + */
>  +inline void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>  +{
>  +kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>   }
>   
>   static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  @@ -769,7 +776,6 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
>  phys_addr_t *ipap)
>   return false;
>   }
>   
>  -#ifdef CONFIG_ARM
>   /**
>    * stage2_wp_ptes - write protect PMD range
>    * @pmd:pointer to pmd entry
>  @@ -917,7 +923,6 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>   
>   stage2_wp_range(kvm, start, end);
>   }
>  -#endif
>   
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct kvm_memory_slot *memslot,
>  diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>  index 40a8d19..a1a35809 100644
>  --- a/arch/arm64/kvm/Kconfig
>  +++ b/arch/arm64/kvm/Kconfig
>  @@ -26,7 +26,7 @@ config KVM
>   select KVM_ARM_HOST
>   select KVM_ARM_VGIC
>   select KVM_ARM_TIMER
>  -

Re: Exposing host debug capabilities to userspace

2014-11-21 Thread Christoffer Dall

On Thu, Nov 20, 2014 at 04:55:14PM +, Alex Bennée wrote:
> Hi,
> 
> I've almost finished the ARMv8 guest debug support but I have one
> problem left to solve. userspace needs to know how many hardware debug
> registers are available for GDB to use. This information is available
> from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to
> fetch it's value however semantically this is poor as it's API is for
> getting guest state not host state and they could theoretically have
> different values.
> 
> So far the options I've examined are:
> 
> * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1)
> 
> As explained above, abusing a guest state API for host configuration.

It's just wrong, and we should only do this if there's absolutely no
other way to do this.

> 
> * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH)
> 
> This is used by GDB to access the host details in debug-monitors.
> However the ptrace API really wants you to attach to a process before
> calling PTRACE_GETREGSET. Currently I've tried attaching to the
> thread_id of the vCPU but this fails with EPERM, I suspect because
> attaching to your own threads likely upsets the kernel.

Can you confirm your suspicion?  This seems like a rather good approach
so we should really investigate why this doesn't work and explore ways
to get it working.

> 
> * KVM ioctl KVM_GET_DEBUGREGS
> 
> This is currently x86 only and looks like it's more aimed at debug
> registers than capability stuff. Also I'm not sure what the state of
> this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or
> is one an older deprecated x86 only API?

The API text and a brief glance of the x86 code seems to indicate that
this is also the vcpu state...

> 
> * Export the information via sysfs
> 
> I suppose the correct canonical non-subsystem specific way to make this
> information available it to expose the data in some sort of sysfs node?
> However I don't see any existing sysfs structure for the CPU.
> 
> * Expand /proc/cpuinfo
> 
> I suspect adding extra text to be badly parsed by userspace is just
> horrid and unacceptable behaviour ;-)
> 
> * Add another KVM ioctl?
> 
> This would have the downside of being specific to KVM and of course
> proliferating the API space again.
> 
This may not be that bad, for example, could we ever imaging that we'd
only want to export a few of the debug registers for host gdbstub usage?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Exposing host debug capabilities to userspace

2014-11-21 Thread Alex Bennée


Christoffer Dall  writes:

> On Thu, Nov 20, 2014 at 04:55:14PM +, Alex Bennée wrote:
>> Hi,
>> 
>> I've almost finished the ARMv8 guest debug support but I have one
>> problem left to solve. userspace needs to know how many hardware debug
>> registers are available for GDB to use.
>> * KVM ioctl KVM_GET_DEBUGREGS
>> 
>> This is currently x86 only and looks like it's more aimed at debug
>> registers than capability stuff. Also I'm not sure what the state of
>> this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or
>> is one an older deprecated x86 only API?
>
> The API text and a brief glance of the x86 code seems to indicate that
> this is also the vcpu state...

Yeah I was getting confused as to the difference between the two API
calls. Is this just an x86 version of what GET/SET_ONE_REG replaced?

>> * Add another KVM ioctl?
>> 
>> This would have the downside of being specific to KVM and of course
>> proliferating the API space again.
>> 
> This may not be that bad, for example, could we ever imaging that we'd
> only want to export a few of the debug registers for host gdbstub
> usage?

However it is general information which might be useful to the whole
system (although I suspect KVM and PTRACE are the only two). It would be
a shame to have an informational API wrapped up in the extra
boiler-plate of a specific API.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

2014-11-21 Thread Anup Patel

Hi Christoffer,

On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
 wrote:
> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>  wrote:
>> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> Hi All,
>> >>
>> >> I have second thoughts about rebasing KVM PMU patches
>> >> to Marc's irq-forwarding patches.
>> >>
>> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> forwarded IRQs because they are shared between Host
>> >> and Guest.
>> >>
>> >> Scenario1
>> >> -
>> >>
>> >> We might have perf running on Host and no KVM guest
>> >> running. In this scenario, we wont get interrupts on Host
>> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> implementation) has put all host PMU IRQs in forwarding
>> >> mode.
>> >>
>> >> The only way solve this problem is to not set forwarding
>> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> have special routines to turn on and turn off the forwarding
>> >> mode of PMU IRQs. These routines will be called from
>> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> forwarding state.
>> >>
>> >> Scenario2
>> >> -
>> >>
>> >> We might have perf running on Host and Guest simultaneously
>> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> of Marc's patchset which is called before local_irq_enable()).
>> >>
>> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> we put additional checks to inspect VCPU PMU state.
>> >>
>> >> Am I missing any detail about IRQ forwarding for above
>> >> scenarios?
>> >>
>> > Hi Anup,
>>
>> Hi Christoffer,
>>
>> >
>> > I briefly discussed this with Marc.  What I don't understand is how it
>> > would be possible to get an interrupt for the host while running the
>> > guest?
>> >
>> > The rationale behind my question is that whenever you're running the
>> > guest, the PMU should be programmed exclusively with guest state, and
>> > since the PMU is per core, any interrupts should be for the guest, where
>> > it would always be pending.
>>
>> Yes, thats right PMU is programmed exclusively for guest when
>> guest is running and for host when host is running.
>>
>> Let us assume a situation (Scenario2 mentioned previously)
>> where both host and guest are using PMU. When the guest is
>> running we come back to host mode due to variety of reasons
>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> meant for guest, ) which means we will return from the
>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> At this point we would have restored back host PMU context and
>> any PMU counter used by host can trigger PMU overflow interrup
>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> which will try to detect PMU irq forwarding state in GIC hence it
>> can accidentally discover PMU irq pending for guest while this
>> PMU irq is actually meant for host.
>>
>> This above mentioned situation does not happen for timer
>> because virtual timer interrupts are exclusively used for guest.
>> The exclusive use of virtual timer interrupt for guest ensures that
>> the function kvm_timer_sync_hwstate() will always see correct
>> state of virtual timer IRQ from GIC.
>>
> I'm not quite following.
>
> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> you would (1) capture the active state of the IRQ pertaining to the
> guest and (2) deactive the IRQ on the host, then (3) switch the state of
> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> you're running on.
>
> If the host PMU state restored in (3) causes the PMU to raise an
> interrupt, you'll take an interrupt after (4), which is for the host,
> and you'll handle it on the host.
>
We only switch PMU state in assembly code using
kvm_call_hyp(__kvm_vcpu_run, vcpu)
so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
the current hardware PMU state is for host. This means whenever
we are in host mode the host PMU can change state of PMU IRQ
in GIC even if local IRQs are disabled.

Whenever we inspect active state of PMU IRQ in the
kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
Here we are not guaranteed that IRQ forward state returned by the
irq_get_fwd_state() API is for guest only.

The above situation does not manifest for virtual timer because
virtual timer registers are exclusively accessed by Guest and
virtual timer interrupt is only for Guest (never used by Host).

> Whenever yo

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-21 Thread Christoffer Dall

Hi Mario,

On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote:
> Hi Laszlo,
> 
> couple observations.
> 
>  I'm wondering if access from qemu and guest won't
> result in mixed memory attributes and if that's acceptable
> to the CPU.
> 
> Also is if you update memory from qemu you may break
> dirty page logging/migration. Unless there is some other way
> you keep track. Of course it may not be applicable in your
> case (i.e. flash unused after boot).
> 
I'm not concerned about this particular case; dirty page logging exists
so KVM can inform userspace when a page may have been dirtied.  If
userspace directly dirties (is that a verb?) a page, then it already
knows that it needs to migrate that page and deal with it accordingly.

Or did I miss some more subtle point here?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Exposing host debug capabilities to userspace

2014-11-21 Thread Alex Bennée


Christoffer Dall  writes:

> On Thu, Nov 20, 2014 at 04:55:14PM +, Alex Bennée wrote:

>> 
>> * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH)
>> 
>> This is used by GDB to access the host details in debug-monitors.
>> However the ptrace API really wants you to attach to a process before
>> calling PTRACE_GETREGSET. Currently I've tried attaching to the
>> thread_id of the vCPU but this fails with EPERM, I suspect because
>> attaching to your own threads likely upsets the kernel.
>
> Can you confirm your suspicion?  This seems like a rather good approach
> so we should really investigate why this doesn't work and explore ways
> to get it working.

>From ptrace_attach:

retval = -EPERM;
if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;

I think this is what is triggering my EPERM. I'm going to dig into the
history of code around that bit. While I can see it might be undesirable
I'm not sure if it has to be verbotten...

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-11-21 Thread Christoffer Dall

On Mon, Nov 10, 2014 at 09:33:55AM +0100, Ard Biesheuvel wrote:
> Instead of using kvm_is_mmio_pfn() to decide whether a host region
> should be stage 2 mapped with device attributes, add a new static
> function kvm_is_device_pfn() that disregards RAM pages with the
> reserved bit set, as those should usually not be mapped as device
> memory.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm/kvm/mmu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 57a403a5c22b..b007438242e2 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -834,6 +834,11 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>   return kvm_vcpu_dabt_iswrite(vcpu);
>  }
>  
> +static bool kvm_is_device_pfn(unsigned long pfn)
> +{
> + return !pfn_valid(pfn);
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct kvm_memory_slot *memslot, unsigned long hva,
> unsigned long fault_status)
> @@ -904,7 +909,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   if (is_error_pfn(pfn))
>   return -EFAULT;
>  
> - if (kvm_is_mmio_pfn(pfn))
> + if (kvm_is_device_pfn(pfn))
>   mem_type = PAGE_S2_DEVICE;
>  
>   spin_lock(&kvm->mmu_lock);
> -- 
> 1.8.3.2
> 
Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Ard Biesheuvel

On 10 November 2014 09:33, Ard Biesheuvel  wrote:
> This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
> kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
>
> The problem being addressed by the patch above was that some ARM code
> based the memory mapping attributes of a pfn on the return value of
> kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
> be mapped as device memory.
>
> However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
> and the existing non-ARM users were already using it in a way which
> suggests that its name should probably have been 'kvm_is_reserved_pfn'
> from the beginning, e.g., whether or not to call get_page/put_page on
> it etc. This means that returning false for the zero page is a mistake
> and the patch above should be reverted.
>
> Signed-off-by: Ard Biesheuvel 


Ping?


> ---
>  arch/ia64/kvm/kvm-ia64.c |  2 +-
>  arch/x86/kvm/mmu.c   |  6 +++---
>  include/linux/kvm_host.h |  2 +-
>  virt/kvm/kvm_main.c  | 16 
>  4 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index ec6b9acb6bea..dbe46f43884d 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>
> for (i = 0; i < npages; i++) {
> pfn = gfn_to_pfn(kvm, base_gfn + i);
> -   if (!kvm_is_mmio_pfn(pfn)) {
> +   if (!kvm_is_reserved_pfn(pfn)) {
> kvm_set_pmt_entry(kvm, base_gfn + i,
> pfn << PAGE_SHIFT,
> _PAGE_AR_RWX | _PAGE_MA_WB);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ac1c4de3a484..978f402006ee 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>  * kvm mmu, before reclaiming the page, we should
>  * unmap it from mmu first.
>  */
> -   WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
> +   WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>
> if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
> kvm_set_pfn_accessed(pfn);
> @@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> spte |= PT_PAGE_SIZE_MASK;
> if (tdp_enabled)
> spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> -   kvm_is_mmio_pfn(pfn));
> +   kvm_is_reserved_pfn(pfn));
>
> if (host_writable)
> spte |= SPTE_HOST_WRITEABLE;
> @@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
> *vcpu,
>  * PT_PAGE_TABLE_LEVEL and there would be no adjustment done
>  * here.
>  */
> -   if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
> +   if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> level == PT_PAGE_TABLE_LEVEL &&
> PageTransCompound(pfn_to_page(pfn)) &&
> !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04993f2..a6059bdf7b03 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>
> -bool kvm_is_mmio_pfn(pfn_t pfn);
> +bool kvm_is_reserved_pfn(pfn_t pfn);
>
>  struct kvm_irq_ack_notifier {
> struct hlist_node link;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 25ffac9e947d..3cee7b167052 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
>
>  static bool largepages_enabled = true;
>
> -bool kvm_is_mmio_pfn(pfn_t pfn)
> +bool kvm_is_reserved_pfn(pfn_t pfn)
>  {
> if (pfn_valid(pfn))
> -   return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
> +   return PageReserved(pfn_to_page(pfn));
>
> return true;
>  }
> @@ -1321,7 +1321,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool 
> atomic, bool *async,
> else if ((vma->vm_flags & VM_PFNMAP)) {
> pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
> vma->vm_pgoff;
> -   BUG_ON(!kvm_is_mmio_pfn(pfn));
> +   BUG_ON(!kvm_is_reserved_pfn(pfn));
> } else {
> if (async && vma_is_valid(vma, write_fault))
> *async = true;
> @@ -1427,7 +1427,7 @@ static struct page *kvm_pfn_to_page(pfn_t pfn)
> if (is_error_noslot_pfn(pfn))
> return KVM_ERR_PTR_BAD_PAGE;
>
> -   if (kvm_is_mmio_pfn(pfn)) {
> +   if (kvm_is_reserved_pfn(pfn)) {
>

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Christoffer Dall

Hi Paolo,

I think these look good, would you mind queueing them as either a fix or
for 3.19 as you see fit, assuming you agree with the content?

Thanks,
-Christoffer

On Mon, Nov 10, 2014 at 09:33:56AM +0100, Ard Biesheuvel wrote:
> This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
> kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
> 
> The problem being addressed by the patch above was that some ARM code
> based the memory mapping attributes of a pfn on the return value of
> kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
> be mapped as device memory.
> 
> However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
> and the existing non-ARM users were already using it in a way which
> suggests that its name should probably have been 'kvm_is_reserved_pfn'
> from the beginning, e.g., whether or not to call get_page/put_page on
> it etc. This means that returning false for the zero page is a mistake
> and the patch above should be reverted.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/ia64/kvm/kvm-ia64.c |  2 +-
>  arch/x86/kvm/mmu.c   |  6 +++---
>  include/linux/kvm_host.h |  2 +-
>  virt/kvm/kvm_main.c  | 16 
>  4 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index ec6b9acb6bea..dbe46f43884d 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  
>   for (i = 0; i < npages; i++) {
>   pfn = gfn_to_pfn(kvm, base_gfn + i);
> - if (!kvm_is_mmio_pfn(pfn)) {
> + if (!kvm_is_reserved_pfn(pfn)) {
>   kvm_set_pmt_entry(kvm, base_gfn + i,
>   pfn << PAGE_SHIFT,
>   _PAGE_AR_RWX | _PAGE_MA_WB);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ac1c4de3a484..978f402006ee 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>* kvm mmu, before reclaiming the page, we should
>* unmap it from mmu first.
>*/
> - WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
> + WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>  
>   if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
>   kvm_set_pfn_accessed(pfn);
> @@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   spte |= PT_PAGE_SIZE_MASK;
>   if (tdp_enabled)
>   spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> - kvm_is_mmio_pfn(pfn));
> + kvm_is_reserved_pfn(pfn));
>  
>   if (host_writable)
>   spte |= SPTE_HOST_WRITEABLE;
> @@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
> *vcpu,
>* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
>* here.
>*/
> - if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
> + if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
>   level == PT_PAGE_TABLE_LEVEL &&
>   PageTransCompound(pfn_to_page(pfn)) &&
>   !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04993f2..a6059bdf7b03 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  
> -bool kvm_is_mmio_pfn(pfn_t pfn);
> +bool kvm_is_reserved_pfn(pfn_t pfn);
>  
>  struct kvm_irq_ack_notifier {
>   struct hlist_node link;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 25ffac9e947d..3cee7b167052 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
>  
>  static bool largepages_enabled = true;
>  
> -bool kvm_is_mmio_pfn(pfn_t pfn)
> +bool kvm_is_reserved_pfn(pfn_t pfn)
>  {
>   if (pfn_valid(pfn))
> - return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
> + return PageReserved(pfn_to_page(pfn));
>  
>   return true;
>  }
> @@ -1321,7 +1321,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool 
> atomic, bool *async,
>   else if ((vma->vm_flags & VM_PFNMAP)) {
>   pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
>   vma->vm_pgoff;
> - BUG_ON(!kvm_is_mmio_pfn(pfn));
> + BUG_ON(!kvm_is_reserved_pfn(pfn));
>   } else {
>   if (async && vma_is_valid(vma, write_fault))
>   *async = true;
> @@ -1427,7 +1427,7 @@ static struct page *kvm_pfn_to_page(pfn_t pfn)
>   if (is_error_noslot_pfn(pfn))
>   return KVM_ERR_P

Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

2014-11-21 Thread Christoffer Dall

On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>  wrote:
> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >>  wrote:
> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> Hi All,
> >> >>
> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> to Marc's irq-forwarding patches.
> >> >>
> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> forwarded IRQs because they are shared between Host
> >> >> and Guest.
> >> >>
> >> >> Scenario1
> >> >> -
> >> >>
> >> >> We might have perf running on Host and no KVM guest
> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> mode.
> >> >>
> >> >> The only way solve this problem is to not set forwarding
> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> have special routines to turn on and turn off the forwarding
> >> >> mode of PMU IRQs. These routines will be called from
> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> forwarding state.
> >> >>
> >> >> Scenario2
> >> >> -
> >> >>
> >> >> We might have perf running on Host and Guest simultaneously
> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >>
> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> we put additional checks to inspect VCPU PMU state.
> >> >>
> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> scenarios?
> >> >>
> >> > Hi Anup,
> >>
> >> Hi Christoffer,
> >>
> >> >
> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> > would be possible to get an interrupt for the host while running the
> >> > guest?
> >> >
> >> > The rationale behind my question is that whenever you're running the
> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> > it would always be pending.
> >>
> >> Yes, thats right PMU is programmed exclusively for guest when
> >> guest is running and for host when host is running.
> >>
> >> Let us assume a situation (Scenario2 mentioned previously)
> >> where both host and guest are using PMU. When the guest is
> >> running we come back to host mode due to variety of reasons
> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> meant for guest, ) which means we will return from the
> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> At this point we would have restored back host PMU context and
> >> any PMU counter used by host can trigger PMU overflow interrup
> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> can accidentally discover PMU irq pending for guest while this
> >> PMU irq is actually meant for host.
> >>
> >> This above mentioned situation does not happen for timer
> >> because virtual timer interrupts are exclusively used for guest.
> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> the function kvm_timer_sync_hwstate() will always see correct
> >> state of virtual timer IRQ from GIC.
> >>
> > I'm not quite following.
> >
> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> > you would (1) capture the active state of the IRQ pertaining to the
> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> > you're running on.
> >
> > If the host PMU state restored in (3) causes the PMU to raise an
> > interrupt, you'll take an interrupt after (4), which is for the host,
> > and you'll handle it on the host.
> >
> We only switch PMU state in assembly code using
> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> the current hardware PMU state is for host. This means whenever
> we are in host mode the host PMU can change state of PMU IRQ
> in GIC even if local IRQs are disabled.
> 
> Whenever we inspect active state of PMU IRQ in the
> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> Here we are not guaranteed that

Re: [PATCH v1] ARM/ARM64: support KVM_IOEVENTFD

2014-11-21 Thread Eric Auger

Hi Ming,

for your information there is a series written by Antonios (added in CC)
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-March/008416.html
exactly on the same topic.

The thread was reactivated by Nikolay latterly on Nov (see
http://www.gossamer-threads.com/lists/linux/kernel/1886716?page=last).

I am also convinced we must progress on ioeventfd topic concurrently
with irqfd one. What starting point do we use then for further comments?

Best Regards

Eric



On 11/19/2014 06:16 AM, Ming Lei wrote:
> From Documentation/virtual/kvm/api.txt, all ARCHs should support
> ioeventfd.
> 
> Also ARM VM has supported PCI bus already, and ARM64 will do too,
> ioeventfd is required for some popular devices, like virtio-blk
> and virtio-scsi dataplane in QEMU.
> 
> Without this patch, virtio-blk-pci dataplane can't work in QEMU.
> 
> This patch has been tested on both ARM and ARM64.
> 
> Signed-off-by: Ming Lei 
> ---
> v1:
>   - make eventfd.o built in ARM64
>  arch/arm/kvm/Kconfig|1 +
>  arch/arm/kvm/Makefile   |2 +-
>  arch/arm/kvm/arm.c  |1 +
>  arch/arm/kvm/mmio.c |   19 +++
>  arch/arm64/kvm/Kconfig  |1 +
>  arch/arm64/kvm/Makefile |2 +-
>  6 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..25bd83a 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -23,6 +23,7 @@ config KVM
>   select HAVE_KVM_CPU_RELAX_INTERCEPT
>   select KVM_MMIO
>   select KVM_ARM_HOST
> + select HAVE_KVM_EVENTFD
>   depends on ARM_VIRT_EXT && ARM_LPAE
>   ---help---
> Support hosting virtualized guest machines. You will also
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index f7057ed..859db09 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
>  AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
>  
>  KVM := ../../../virt/kvm
> -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
> +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9e193c8..d90d989 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -172,6 +172,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_IRQCHIP:
>   r = vgic_present;
>   break;
> + case KVM_CAP_IOEVENTFD:
>   case KVM_CAP_DEVICE_CTRL:
>   case KVM_CAP_USER_MEMORY:
>   case KVM_CAP_SYNC_MMU:
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> index 4cb5a93..ee332a7 100644
> --- a/arch/arm/kvm/mmio.c
> +++ b/arch/arm/kvm/mmio.c
> @@ -162,6 +162,21 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t 
> fault_ipa,
>   return 0;
>  }
>  
> +static int handle_io_bus_rw(struct kvm_vcpu *vcpu, gpa_t addr,
> + int len, void *val, bool write)
> +{
> + int idx, ret;
> +
> + idx = srcu_read_lock(&vcpu->kvm->srcu);
> + if (write)
> + ret = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, len, val);
> + else
> + ret = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, val);
> + srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> + return ret;
> +}
> +
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>phys_addr_t fault_ipa)
>  {
> @@ -200,6 +215,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
> *run,
>   if (vgic_handle_mmio(vcpu, run, &mmio))
>   return 1;
>  
> + if (!handle_io_bus_rw(vcpu, fault_ipa, mmio.len, &mmio.data,
> + mmio.is_write))
> + return 1;
> +
>   kvm_prepare_mmio(run, &mmio);
>   return 0;
>  }
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8ba85e9..642f57c 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -26,6 +26,7 @@ config KVM
>   select KVM_ARM_HOST
>   select KVM_ARM_VGIC
>   select KVM_ARM_TIMER
> + select HAVE_KVM_EVENTFD
>   ---help---
> Support hosting virtualized guest machines.
>  
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 32a0961..2e6b827 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
>  
> -kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 12:46, Christoffer Dall wrote:
> Hi Paolo,
> 
> I think these look good, would you mind queueing them as either a fix or
> for 3.19 as you see fit, assuming you agree with the content?

Ah, I was thinking _you_ would queue them for 3.19.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3.10] vhost-net: backport extend device allocation to 3.10

2014-11-21 Thread Dmitry Petuhov


12.10.2014 13:30, Michael S. Tsirkin пишет:

On Thu, Oct 09, 2014 at 08:41:23AM +0400, Dmitry Petuhov wrote:

Cc: Michael Mueller 
Signed-off-by: Romain Francoise 
Acked-by: Michael S. Tsirkin 
[mityapetuhov: backport to v3.10: vhost_net_free() in one more place]
Signed-off-by: Dmitry Petuhov 

Sounds reasonable.

Acked-by: Michael S. Tsirkin 

Am I need any extra actions to see it in next 3.10 release?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Christoffer Dall

On Fri, Nov 21, 2014 at 02:06:40PM +0100, Paolo Bonzini wrote:
> 
> 
> On 21/11/2014 12:46, Christoffer Dall wrote:
> > Hi Paolo,
> > 
> > I think these look good, would you mind queueing them as either a fix or
> > for 3.19 as you see fit, assuming you agree with the content?
> 
> Ah, I was thinking _you_ would queue them for 3.19.
> 
We can do that, did I miss your previous ack or reviewed-by?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: can I make this work… (Foundation for accessibility project)

2014-11-21 Thread Paolo Bonzini



On 20/11/2014 23:22, Eric S. Johansson wrote:
> I'll be able to run some tests in about 2 to 3 hours after I finish this
> document. Let me know what I should look at?  on a side note, a pointer
> to an automated install process would be wonderful.

GNOME Boxes can pretty much automate the install process.

Can you just run "ps aux" while the install is running and send the result?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM causes #GP on XRSTORS

2014-11-21 Thread Paolo Bonzini

On 20/11/2014 17:34, Nadav Amit wrote:
> Fenghua,
> 
> I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to 
> get a #GP when it is trying to load the guest FPU.
> One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I 
> am not sure it is the only problem.
> Was KVM ever tested with XRSTORS?

What is the content of the CPUID[EAX=13,ECX=0] and CPUID[EAX=13,ECX=1]
leaves on the host?

Fenghua, which processors have XSAVEC, which have XGETBV with ECX=1, and
which have XSAVES?  We need to expose this in QEMU, for which I can send
a patch later today or next week (CCing Eduardo for this).

We will also have to uncompact the XSAVE area either in KVM_GET_XSAVE or
in QEMU.  It's probably not hard to do it in the kernel.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-21 Thread Radim Krčmář

2014-11-20 14:42+0100, Paolo Bonzini:
> ia64 does not need them anymore.

(Similar for device assignment and iommu, should I prepare patches?)

> Signed-off-by: Paolo Bonzini 
> ---

At least one compile-breaker on arches without IOAPIC,

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04993f2..d2d42709d6f4 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> +#ifdef __KVM_HAVE_IOAPIC
> +void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
> +#else
> +static inline void kvm_vcpu_request-scan_ioapic(struct kvm *kvm)
  ^_-
> +{
> +}
> +#endif
> +

Reviewed-by: Radim Krčmář 

And we could clean them up as well:
---8<---
KVM: x86: remove IA64 from ioapic.c and irq_comm.c

They won't get compiled in x86 tree.

Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/ioapic.c   | 12 
 arch/x86/kvm/irq_comm.c | 41 ++---
 2 files changed, 2 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 0ba4057..b1947e0 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -270,7 +270,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
spin_unlock(&ioapic->lock);
 }
 
-#ifdef CONFIG_X86
 void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic = kvm->arch.vioapic;
@@ -279,12 +278,6 @@ void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
return;
kvm_make_scan_ioapic_request(kvm);
 }
-#else
-void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
-{
-   return;
-}
-#endif
 
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 {
@@ -586,11 +579,6 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
case IOAPIC_REG_WINDOW:
ioapic_write_indirect(ioapic, data);
break;
-#ifdef CONFIG_IA64
-   case IOAPIC_REG_EOI:
-   __kvm_ioapic_update_eoi(NULL, ioapic, data, IOAPIC_LEVEL_TRIG);
-   break;
-#endif
 
default:
break;
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index a8f988c..72298b3 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -26,9 +26,6 @@
 #include 
 
 #include 
-#ifdef CONFIG_IA64
-#include 
-#endif
 
 #include "irq.h"
 
@@ -38,12 +35,8 @@ static int kvm_set_pic_irq(struct 
kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level,
   bool line_status)
 {
-#ifdef CONFIG_X86
struct kvm_pic *pic = pic_irqchip(kvm);
return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
-#else
-   return -1;
-#endif
 }
 
 static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
@@ -57,12 +50,7 @@ static int kvm_set_ioapic_irq(struct 
kvm_kernel_irq_routing_entry *e,
 
 inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
 {
-#ifdef CONFIG_IA64
-   return irq->delivery_mode ==
-   (IOSAPIC_LOWEST_PRIORITY << IOSAPIC_DELIVERY_SHIFT);
-#else
return irq->delivery_mode == APIC_DM_LOWEST;
-#endif
 }
 
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
@@ -202,9 +190,7 @@ int kvm_request_irq_source_id(struct kvm *kvm)
}
 
ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
-#ifdef CONFIG_X86
ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
-#endif
set_bit(irq_source_id, bitmap);
 unlock:
mutex_unlock(&kvm->irq_lock);
@@ -215,9 +201,7 @@ unlock:
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
 {
ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
-#ifdef CONFIG_X86
ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
-#endif
 
mutex_lock(&kvm->irq_lock);
if (irq_source_id < 0 ||
@@ -230,9 +214,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
irq_source_id)
goto unlock;
 
kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
-#ifdef CONFIG_X86
kvm_pic_clear_all(pic_irqchip(kvm), irq_source_id);
-#endif
 unlock:
mutex_unlock(&kvm->irq_lock);
 }
@@ -322,16 +304,11 @@ out:
  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
 #define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
 
-#ifdef CONFIG_X86
-#  define PIC_ROUTING_ENTRY(irq) \
+#define PIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,  \
  .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
-#  define ROUTING_ENTRY2(irq) \
+#define ROUTING_ENTRY2(irq) \
IOAPIC_ROUTING_ENTRY(irq), PIC_ROUTING_ENTRY(irq)
-#else
-#  define ROUTING_ENTRY2(irq) \
-   IOAPIC_ROUTING_ENTRY(irq)
-#endif
 
 static const struct kvm_irq_routing_entry default_routing[] = {
ROUTING_ENTRY2(0), ROUTING_ENTRY2(1),
@@ -346,20 +323,6 @@ static const struct kvm_irq_routing_entry

Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-21 Thread Paolo Bonzini

On 21/11/2014 17:19, Radim Krčmář wrote:
> 2014-11-20 14:42+0100, Paolo Bonzini:
>> ia64 does not need them anymore.
> 
> (Similar for device assignment and iommu, should I prepare patches?)

Sure!  Feel free to join the party. ;)

Paolo

>> Signed-off-by: Paolo Bonzini 
>> ---
> 
> At least one compile-breaker on arches without IOAPIC,
> 
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04993f2..d2d42709d6f4 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> +#ifdef __KVM_HAVE_IOAPIC
>> +void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
>> +#else
>> +static inline void kvm_vcpu_request-scan_ioapic(struct kvm *kvm)
>   ^_-
>> +{
>> +}
>> +#endif
>> +
> 
> Reviewed-by: Radim Krčmář 
> 
> And we could clean them up as well:

Will squash in next monday.

Paolo

> ---8<---
> KVM: x86: remove IA64 from ioapic.c and irq_comm.c
> 
> They won't get compiled in x86 tree.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  arch/x86/kvm/ioapic.c   | 12 
>  arch/x86/kvm/irq_comm.c | 41 ++---
>  2 files changed, 2 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 0ba4057..b1947e0 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -270,7 +270,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
> *eoi_exit_bitmap,
>   spin_unlock(&ioapic->lock);
>  }
>  
> -#ifdef CONFIG_X86
>  void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
>  {
>   struct kvm_ioapic *ioapic = kvm->arch.vioapic;
> @@ -279,12 +278,6 @@ void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
>   return;
>   kvm_make_scan_ioapic_request(kvm);
>  }
> -#else
> -void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
> -{
> - return;
> -}
> -#endif
>  
>  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  {
> @@ -586,11 +579,6 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
> gpa_t addr, int len,
>   case IOAPIC_REG_WINDOW:
>   ioapic_write_indirect(ioapic, data);
>   break;
> -#ifdef   CONFIG_IA64
> - case IOAPIC_REG_EOI:
> - __kvm_ioapic_update_eoi(NULL, ioapic, data, IOAPIC_LEVEL_TRIG);
> - break;
> -#endif
>  
>   default:
>   break;
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index a8f988c..72298b3 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -26,9 +26,6 @@
>  #include 
>  
>  #include 
> -#ifdef CONFIG_IA64
> -#include 
> -#endif
>  
>  #include "irq.h"
>  
> @@ -38,12 +35,8 @@ static int kvm_set_pic_irq(struct 
> kvm_kernel_irq_routing_entry *e,
>  struct kvm *kvm, int irq_source_id, int level,
>  bool line_status)
>  {
> -#ifdef CONFIG_X86
>   struct kvm_pic *pic = pic_irqchip(kvm);
>   return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
> -#else
> - return -1;
> -#endif
>  }
>  
>  static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
> @@ -57,12 +50,7 @@ static int kvm_set_ioapic_irq(struct 
> kvm_kernel_irq_routing_entry *e,
>  
>  inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
>  {
> -#ifdef CONFIG_IA64
> - return irq->delivery_mode ==
> - (IOSAPIC_LOWEST_PRIORITY << IOSAPIC_DELIVERY_SHIFT);
> -#else
>   return irq->delivery_mode == APIC_DM_LOWEST;
> -#endif
>  }
>  
>  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> @@ -202,9 +190,7 @@ int kvm_request_irq_source_id(struct kvm *kvm)
>   }
>  
>   ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
> -#ifdef CONFIG_X86
>   ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
> -#endif
>   set_bit(irq_source_id, bitmap);
>  unlock:
>   mutex_unlock(&kvm->irq_lock);
> @@ -215,9 +201,7 @@ unlock:
>  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
>  {
>   ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
> -#ifdef CONFIG_X86
>   ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
> -#endif
>  
>   mutex_lock(&kvm->irq_lock);
>   if (irq_source_id < 0 ||
> @@ -230,9 +214,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
> irq_source_id)
>   goto unlock;
>  
>   kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
> -#ifdef CONFIG_X86
>   kvm_pic_clear_all(pic_irqchip(kvm), irq_source_id);
> -#endif
>  unlock:
>   mutex_unlock(&kvm->irq_lock);
>  }
> @@ -322,16 +304,11 @@ out:
> .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
>  #define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
>  
> -#ifdef CONFIG_X86
> -#  define PIC_ROUTING_ENTRY(irq) \
> +#define PIC_ROUTING_ENTRY(irq) \
>   { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,  \
> .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
> -#  define ROUTING_E

Re: can I make this work… (Foundation for accessibility project)

2014-11-21 Thread Eric S. Johansson



On 11/21/2014 09:06 AM, Paolo Bonzini wrote:


On 20/11/2014 23:22, Eric S. Johansson wrote:

I'll be able to run some tests in about 2 to 3 hours after I finish this
document. Let me know what I should look at?  on a side note, a pointer
to an automated install process would be wonderful.

GNOME Boxes can pretty much automate the install process.

Can you just run "ps aux" while the install is running and send the result?
I went back and verified I had installed all packages.  apparently I 
missed a few updates.  also I was more familiar with the UI tool.  I 
noticed a few places where kvm was now an option.  last I made a copy of 
the dvd to an iso as an install image.  end result is *wow* much 
faster.  I now have hope that my project will work.  sure does like 
giving 110% in cpu speed.


4384 libvirt+  20   0 2825112 2.058g   9960 R 109.1 26.6  12:47.73 
qemu-system-x86


next report after updates install

btw, would you like a better UI design for a management tool?  I have 
some ideas but would need someone with hands to put it together.


--- eric

top sez

Tasks: 182 total,   4 running, 178 sleeping,   0 stopped,   0 zombie
%Cpu(s): 44.2 us, 14.9 sy,  0.0 ni, 38.7 id,  2.0 wa,  0.0 hi,  0.2 si,  
0.0 st

KiB Mem:   8128204 total,  4750320 used,  3377884 free,54476 buffers
KiB Swap:  8338428 total,0 used,  8338428 free.  1996164 cached Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ 
COMMAND
 4384 libvirt+  20   0 2634992 2.033g   9940 R 108.6 26.2   2:02.83 
qemu-syste+

 2668 eric  20   0 1284184  66308  29828 S   2.3  0.8   0:21.50 compiz
 1314 root  20   0 1032288  22264  11436 S   2.0  0.3   0:46.29 
libvirtd
   18 root  20   0   0  0  0 S   1.7  0.0   0:00.96 
kworker/1:0

 1423 root  20   0  410736  49196  35228 S   1.7  0.6   0:32.18 Xorg
 4694 root  20   0   0  0  0 R   1.7  0.0   0:00.20 
kworker/0:1

 2837 eric  20   0 1481612 102828  38476 S   1.0  1.3   0:54.03 python
 2628 eric  20   0   20232940768 S   0.3  0.0   0:00.69 
syndaemon
 3047 eric  20   0  653160  20868  12472 S   0.3  0.3   0:02.14 
gnome-term+
 3147 eric  20   0  377868   4168   3288 S   0.3  0.1   0:00.04 
deja-dup-m+

1 root  20   0   33908   3280   1472 S   0.0  0.0   0:01.62 init
2 root  20   0   0  0  0 S   0.0  0.0   0:00.00 
kthreadd
3 root  20   0   0  0  0 S   0.0  0.0   0:00.16 
ksoftirqd/0
4 root  20   0   0  0  0 S   0.0  0.0   0:00.72 
kworker/0:0
5 root   0 -20   0  0  0 S   0.0  0.0   0:00.00 
kworker/0:+
7 root  20   0   0  0  0 S   0.0  0.0   0:00.50 
rcu_sched
8 root  20   0   0  0  0 R   0.0  0.0   0:00.40 
rcuos/0

eric@garnet:~$
ps aux sez

eric@garnet:~$ ps -aux
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.1  0.0  33908  3280 ?Ss   11:12   0:01 /sbin/init
root 2  0.0  0.0  0 0 ?S11:12   0:00 [kthreadd]
root 3  0.0  0.0  0 0 ?S11:12   0:00 
[ksoftirqd/0]
root 4  0.0  0.0  0 0 ?S11:12   0:00 
[kworker/0:0]
root 5  0.0  0.0  0 0 ?S<   11:12   0:00 
[kworker/0:0H]

root 7  0.0  0.0  0 0 ?S11:12   0:00 [rcu_sched]
root 8  0.0  0.0  0 0 ?S11:12   0:00 [rcuos/0]
root 9  0.0  0.0  0 0 ?S11:12   0:00 [rcuos/1]
root10  0.0  0.0  0 0 ?S11:12   0:00 [rcu_bh]
root11  0.0  0.0  0 0 ?S11:12   0:00 [rcuob/0]
root12  0.0  0.0  0 0 ?S11:12   0:00 [rcuob/1]
root13  0.0  0.0  0 0 ?S11:12   0:00 
[migration/0]
root14  0.0  0.0  0 0 ?S11:12   0:00 
[watchdog/0]
root15  0.0  0.0  0 0 ?S11:12   0:00 
[watchdog/1]
root16  0.0  0.0  0 0 ?S11:12   0:00 
[migration/1]
root17  0.0  0.0  0 0 ?S11:12   0:00 
[ksoftirqd/1]
root18  0.0  0.0  0 0 ?S11:12   0:01 
[kworker/1:0]
root19  0.0  0.0  0 0 ?S<   11:12   0:00 
[kworker/1:0H]

root20  0.0  0.0  0 0 ?S<   11:12   0:00 [khelper]
root21  0.0  0.0  0 0 ?S11:12   0:00 [kdevtmpfs]
root22  0.0  0.0  0 0 ?S<   11:12   0:00 [netns]
root23  0.0  0.0  0 0 ?S<   11:12   0:00 [writeback]
root24  0.0  0.0  0 0 ?S<   11:12   0:00 
[kintegrityd]

root25  0.0  0.0  0 0 ?S<   11:12   0:00 [bioset]
root26  0.0  0.0  0 0 ?S<   11:12   0:00 
[kworker/u5:0]

root27  0.0  0.0  0 0 ?S<   11:12   0:00 [kblockd]
root28  0.0  0.0  0 0 ?S<   11:

Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-21 Thread Paolo Bonzini

On 21/11/2014 17:19, Radim Krčmář wrote:
> KVM: x86: remove IA64 from ioapic.c and irq_comm.c
> 
> They won't get compiled in x86 tree.

Ah no, these were already in my ia64 removal patch.  I had a deja-vu
feeling...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] arm64: KVM: fix unmapping with 48-bit VAs

2014-11-21 Thread Marc Zyngier

From: Mark Rutland 

Currently if using a 48-bit VA, tearing down the hyp page tables (which
can happen in the absence of a GICH or GICV resource) results in the
rather nasty splat below, evidently becasue we access a table that
doesn't actually exist.

Commit 38f791a4e499792e (arm64: KVM: Implement 48 VA support for KVM EL2
and Stage-2) added a pgd_none check to __create_hyp_mappings to account
for the additional level of tables, but didn't add a corresponding check
to unmap_range, and this seems to be the source of the problem.

This patch adds the missing pgd_none check, ensuring we don't try to
access tables that don't exist.

Original splat below:

kvm [1]: Using HYP init bounce page @83fe94a000
kvm [1]: Cannot obtain GICH resource
Unable to handle kernel paging request at virtual address 7f7fff00
pgd = 8077
[7f7fff00] *pgd=
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc2+ #89
task: 8003eb50 ti: 8003eb45c000 task.ti: 8003eb45c000
PC is at unmap_range+0x120/0x580
LR is at free_hyp_pgds+0xac/0xe4
pc : [] lr : [] pstate: 8045
sp : 8003eb45fbf0
x29: 8003eb45fbf0 x28: 80736000
x27: 80735000 x26: 7f7fff00
x25: 4000 x24: 806f5000
x23:  x22: 007f
x21: 8000 x20: 0080
x19:  x18: 80648000
x17: 80537228 x16: 
x15: 001f x14: 
x13: 0001 x12: 0020
x11: 0062 x10: 0006
x9 :  x8 : 0063
x7 : 0018 x6 : 0003ff00
x5 : 80744188 x4 : 0001
x3 : 4000 x2 : 8000
x1 : 007f x0 : 3fff

Process swapper/0 (pid: 1, stack limit = 0x8003eb45c058)
Stack: (0x8003eb45fbf0 to 0x8003eb46)
fbe0: eb45fcb0 8003 0009cad8 8000
fc00:  0080 00736140 8000 00736000 8000  7c80
fc20:  0080 006f5000 8000  0080 00743000 8000
fc40: 00735000 8000 006d3030 8000 006fe7b8 8000  0080
fc60:  007f fdac1000 8003 fd94b000 8003 fda47000 8003
fc80: 00502b40 8000 ff00 7f7f fdec6000 8003 fdac1630 8003
fca0: eb45fcb0 8003  007f eb45fd00 8003 0009b378 8000
fcc0: ffea  006fe000 8000 00736728 8000 00736120 8000
fce0: 0040  00743000 8000 006fe7b8 8000 0050cd48 
fd00: eb45fd60 8003 00096070 8000 006f06e0 8000 006f06e0 8000
fd20: fd948b40 8003 0009a320 8000    
fd40: 0ae0  006aa25c 8000 eb45fd60 8003 0017ca44 0002
fd60: eb45fdc0 8003 0009a33c 8000 006f06e0 8000 006f06e0 8000
fd80: fd948b40 8003 0009a320 8000   00735000 8000
fda0: 006d3090 8000 006aa25c 8000 00735000 8000 006d3030 8000
fdc0: eb45fdd0 8003 000814c0 8000 eb45fe50 8003 006aaac4 8000
fde0: 006ddd90 8000 0006  006d3000 8000 0095 
fe00: 006a1e90 8000 00735000 8000 006d3000 8000 006aa25c 8000
fe20: 00735000 8000 006d3030 8000 eb45fe50 8003 006fac68 8000
fe40: 0006 0006 fe293ee6 8003 eb45feb0 8003 004f8ee8 8000
fe60: 004f8ed4 8000 00735000 8000    
fe80:        
fea0:       000843d0 8000
fec0: 004f8ed4 8000      
fee0:        
ff00:        
ff20:        
ff40:        
ff60:        
ff80:        
ffa0:        
ffc0:       0005 
ffe0:        
Call trace:
[] unmap_range+0x120/0x580
[] free_hyp_pgds+0xa8/0xe4
[] kvm_arch_init+0x268/0x44c
[] kvm_init+0x24/0x260
[] arm_init+0x18/0x24
[] do_one_initcall+0x88/0x1a0
[] kernel_init_freeable+0x148/0x1e8
[] kernel_init+0x10/0xd4
Code: 8b000263 92628479 d1000720 eb01001f (f9400340)
---[ end trace 3bc230562e926fa4 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b

Signed-off-by: Mark Rutland 
Cc: Catalin Marinas 
Cc: Jun

[PULL] arm/arm64: KVM: pull request for 3.18-rc6

2014-11-21 Thread Marc Zyngier

Hi Paolo,

Please consider pulling the following patches that fixes a few issues
for KVM on arm/arm64.

The following changes since commit 41e7ed64d86db351a94063596b478a0bfc040258:

  KVM: nVMX: Disable preemption while reading from shadow VMCS (2014-10-29 
13:13:52 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvm-arm-for-3.18-rc6

for you to fetch changes up to 837711af0e99718af5a8cc84fe42ea335c9c71ce:

  arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create() (2014-11-21 
17:00:57 +)


Updates for KVM/{arm,arm64}, fixing a few issues:
- fix an unmap error when using 48bit VAs
- trap access to ICC_SRE_EL1 when the guest is trying to use GICv3
- return an error when userspace is trying to init the vgic on a running VM



Christoffer Dall (2):
  arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI
  arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create()

Mark Rutland (1):
  arm64: KVM: fix unmapping with 48-bit VAs

 arch/arm/kvm/mmu.c| 3 ++-
 arch/arm64/kvm/sys_regs.c | 9 +
 virt/kvm/arm/vgic.c   | 8 
 3 files changed, 15 insertions(+), 5 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI

2014-11-21 Thread Marc Zyngier

From: Christoffer Dall 

When running on a system with a GICv3, we currenly don't allow the guest
to access the system register interface of the GICv3.  We do this by
clearing the ICC_SRE_EL2.Enable, which causes all guest accesses to
ICC_SRE_EL1 to trap to EL2 and causes all guest accesses to other ICC_
registers to cause an undefined exception in the guest.

However, we currently don't handle the trap of guest accesses to
ICC_SRE_EL1 and will spill out a warning.  The trap just needs to handle
the access as RAZ/WI, and a guest that tries to prod this register and
set ICC_SRE_EL1.SRE=1, must read back the value (which Linux already
does) to see if it succeeded, and will thus observe that ICC_SRE_EL1.SRE
was not set.

Add the simple trap handler in the sorted table of the system registers.

Signed-off-by: Christoffer Dall 
[ardb: added cp15 handling]
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4cc3b71..3d7c2df 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -424,6 +424,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* VBAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b), Op2(0b000),
  NULL, reset_val, VBAR_EL1, 0 },
+
+   /* ICC_SRE_EL1 */
+   { Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1100), Op2(0b101),
+ trap_raz_wi },
+
/* CONTEXTIDR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b), Op2(0b001),
  access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
@@ -690,6 +695,10 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
+
+   /* ICC_SRE */
+   { Op1( 0), CRn(12), CRm(12), Op2( 5), trap_raz_wi },
+
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
 };
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create()

2014-11-21 Thread Marc Zyngier

From: Christoffer Dall 

If we detect another vCPU is running we just exit and return 0 as if we
succesfully created the VGIC, but the VGIC wouldn't actual be created.

This shouldn't break in-kernel behavior because the kernel will not
observe the failed the attempt to create the VGIC, but userspace could
be rightfully confused.

Cc: Andre Przywara 
Signed-off-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 virt/kvm/arm/vgic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 3aaca49..aacdb59 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1933,7 +1933,7 @@ out:
 
 int kvm_vgic_create(struct kvm *kvm)
 {
-   int i, vcpu_lock_idx = -1, ret = 0;
+   int i, vcpu_lock_idx = -1, ret;
struct kvm_vcpu *vcpu;
 
mutex_lock(&kvm->lock);
@@ -1948,6 +1948,7 @@ int kvm_vgic_create(struct kvm *kvm)
 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
 * that no other VCPUs are run while we create the vgic.
 */
+   ret = -EBUSY;
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!mutex_trylock(&vcpu->mutex))
goto out_unlock;
@@ -1955,11 +1956,10 @@ int kvm_vgic_create(struct kvm *kvm)
}
 
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (vcpu->arch.has_run_once) {
-   ret = -EBUSY;
+   if (vcpu->arch.has_run_once)
goto out_unlock;
-   }
}
+   ret = 0;
 
spin_lock_init(&kvm->arch.vgic.lock);
kvm->arch.vgic.in_kernel = true;
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-21 Thread Radim Krčmář

2014-11-21 18:05+0100, Paolo Bonzini:
> On 21/11/2014 17:19, Radim Krčmář wrote:
> > KVM: x86: remove IA64 from ioapic.c and irq_comm.c
> > 
> > They won't get compiled in x86 tree.
> 
> Ah no, these were already in my ia64 removal patch.  I had a deja-vu
> feeling...

Oops, renaming simplifies conflict resolution ...
CONFIG_X86 removal should still be applicable though.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: can I make this work… (Foundation for accessibility project)

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 17:52, Eric S. Johansson wrote:
> 
> 4384 libvirt+  20   0 2825112 2.058g   9960 R 109.1 26.6  12:47.73
> qemu-system-x86
> 
> next report after updates install
> 
> btw, would you like a better UI design for a management tool?  I have
> some ideas but would need someone with hands to put it together.

I don't develop the management tool, but there are several.  The most
advanced UI is probably in GNOME Boxes, but it also has less
functionality than virt-manager.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next puzzle: Re: can I make this work… (Foundation for accessibility project)

2014-11-21 Thread Eric S. Johansson



On 11/21/2014 11:52 AM, Eric S. Johansson wrote:



4384 libvirt+  20   0 2825112 2.058g   9960 R 109.1 26.6  12:47.73 
qemu-system-x86


next report after updates install


next puzzle. updates are not working
using bridged to eth0
using virt io driver (checked install on windows)
browser works in vm (quite well in fact)
watching output of tcpdump

and there is no apparent traffic for updates.

any ideas?


btw, would you like a better UI design for a management tool?  I have 
some ideas but would need someone with hands to put it together.


--- eric

top sez

Tasks: 182 total,   4 running, 178 sleeping,   0 stopped,   0 zombie
%Cpu(s): 44.2 us, 14.9 sy,  0.0 ni, 38.7 id,  2.0 wa,  0.0 hi, 0.2 
si,  0.0 st

KiB Mem:   8128204 total,  4750320 used,  3377884 free,54476 buffers
KiB Swap:  8338428 total,0 used,  8338428 free.  1996164 
cached Mem


  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
 4384 libvirt+  20   0 2634992 2.033g   9940 R 108.6 26.2 2:02.83 
qemu-syste+

 2668 eric  20   0 1284184  66308  29828 S   2.3  0.8 0:21.50 compiz
 1314 root  20   0 1032288  22264  11436 S   2.0  0.3 0:46.29 
libvirtd
   18 root  20   0   0  0  0 S   1.7  0.0 0:00.96 
kworker/1:0

 1423 root  20   0  410736  49196  35228 S   1.7  0.6 0:32.18 Xorg
 4694 root  20   0   0  0  0 R   1.7  0.0 0:00.20 
kworker/0:1

 2837 eric  20   0 1481612 102828  38476 S   1.0  1.3 0:54.03 python
 2628 eric  20   0   20232940768 S   0.3  0.0 0:00.69 
syndaemon
 3047 eric  20   0  653160  20868  12472 S   0.3  0.3 0:02.14 
gnome-term+
 3147 eric  20   0  377868   4168   3288 S   0.3  0.1 0:00.04 
deja-dup-m+

1 root  20   0   33908   3280   1472 S   0.0  0.0 0:01.62 init
2 root  20   0   0  0  0 S   0.0  0.0 0:00.00 
kthreadd
3 root  20   0   0  0  0 S   0.0  0.0 0:00.16 
ksoftirqd/0
4 root  20   0   0  0  0 S   0.0  0.0 0:00.72 
kworker/0:0
5 root   0 -20   0  0  0 S   0.0  0.0 0:00.00 
kworker/0:+
7 root  20   0   0  0  0 S   0.0  0.0 0:00.50 
rcu_sched

8 root  20   0   0  0  0 R   0.0  0.0 0:00.40 rcuos/0
eric@garnet:~$
ps aux sez

eric@garnet:~$ ps -aux
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.1  0.0  33908  3280 ?Ss   11:12   0:01 
/sbin/init
root 2  0.0  0.0  0 0 ?S11:12   0:00 
[kthreadd]
root 3  0.0  0.0  0 0 ?S11:12   0:00 
[ksoftirqd/0]
root 4  0.0  0.0  0 0 ?S11:12   0:00 
[kworker/0:0]
root 5  0.0  0.0  0 0 ?S<   11:12 0:00 
[kworker/0:0H]
root 7  0.0  0.0  0 0 ?S11:12   0:00 
[rcu_sched]
root 8  0.0  0.0  0 0 ?S11:12   0:00 
[rcuos/0]
root 9  0.0  0.0  0 0 ?S11:12   0:00 
[rcuos/1]

root10  0.0  0.0  0 0 ?S11:12   0:00 [rcu_bh]
root11  0.0  0.0  0 0 ?S11:12   0:00 
[rcuob/0]
root12  0.0  0.0  0 0 ?S11:12   0:00 
[rcuob/1]
root13  0.0  0.0  0 0 ?S11:12   0:00 
[migration/0]
root14  0.0  0.0  0 0 ?S11:12   0:00 
[watchdog/0]
root15  0.0  0.0  0 0 ?S11:12   0:00 
[watchdog/1]
root16  0.0  0.0  0 0 ?S11:12   0:00 
[migration/1]
root17  0.0  0.0  0 0 ?S11:12   0:00 
[ksoftirqd/1]
root18  0.0  0.0  0 0 ?S11:12   0:01 
[kworker/1:0]
root19  0.0  0.0  0 0 ?S<   11:12 0:00 
[kworker/1:0H]

root20  0.0  0.0  0 0 ?S<   11:12 0:00 [khelper]
root21  0.0  0.0  0 0 ?S11:12   0:00 
[kdevtmpfs]

root22  0.0  0.0  0 0 ?S<   11:12 0:00 [netns]
root23  0.0  0.0  0 0 ?S<   11:12 0:00 
[writeback]
root24  0.0  0.0  0 0 ?S<   11:12 0:00 
[kintegrityd]

root25  0.0  0.0  0 0 ?S<   11:12 0:00 [bioset]
root26  0.0  0.0  0 0 ?S<   11:12 0:00 
[kworker/u5:0]

root27  0.0  0.0  0 0 ?S<   11:12 0:00 [kblockd]
root28  0.0  0.0  0 0 ?S<   11:12 0:00 [ata_sff]
root29  0.0  0.0  0 0 ?S11:12   0:00 [khubd]
root30  0.0  0.0  0 0 ?S<   11:12 0:00 [md]
root31  0.0  0.0  0 0 ?S<   11:12 0:00 
[devfreq_wq]
root34  0.0  0.0  0 0 ?S11:12   0:00 
[khungtaskd]
root35  0.0  0.0  0 0 ?S11:12   0:00 
[kswapd0]

root36  0.1  0.0  0 0 ?SN   11:12   0:02 [ksmd]
root37  0.0  0.0  0 0 ?SN   11:12   0:00 
[khugepaged]
root38  0

[CFT PATCH 1/2] kvm: x86: mask out XSAVES

2014-11-21 Thread Paolo Bonzini

This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS.  Mask it out.

Cc: sta...@vger.kernel.org
Cc: Nadav Amit 
Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/cpuid.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 20d83217fb1d..a4f5ac46226c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -320,6 +320,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
F(AVX512CD);
 
+   /* cpuid 0xD.1.eax */
+   const u32 kvm_supported_word10_x86_features =
+   F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -456,13 +460,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
entry->eax &= supported;
entry->edx &= supported >> 32;
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+   if (!supported)
+   break;
+
for (idx = 1, i = 1; idx < 64; ++idx) {
u64 mask = ((u64)1 << idx);
if (*nent >= maxnent)
goto out;
 
do_cpuid_1_ent(&entry[i], function, idx);
-   if (entry[i].eax == 0 || !(supported & mask))
+   if (idx == 1)
+   entry[i].eax &= 
kvm_supported_word10_x86_features;
+   else if (entry[i].eax == 0 || !(supported & mask))
continue;
entry[i].flags |=
   KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[CFT PATCH 2/2] KVM: x86: support XSAVES usage in the host

2014-11-21 Thread Paolo Bonzini

Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format.  Convert
in order to preserve userspace ABI.

Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324
Cc: Fenghua Yu 
Cc: sta...@vger.kernel.org
Cc: Nadav Amit 
Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/x86.c | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5337039427c8..7e8a20e5615a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3131,15 +3131,53 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return 0;
 }
 
+#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
+
+static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
+{
+   struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
+   u64 xstate_bv = vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE;
+   u64 valid;
+
+   /*
+* Copy legacy XSAVE area, to avoid complications with CPUID
+* leaves 0 and 1 in the loop below.
+*/
+   memcpy(dest, xsave, XSAVE_HDR_OFFSET);
+
+   /* Set XSTATE_BV */
+   *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
+
+   /*
+* Copy each region from the possibly compacted offset to the
+* non-compacted offset.
+*/
+   valid = xstate_bv & ~XSTATE_FPSSE;
+   if (xsave->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED)
+   valid &= xsave->xsave_hdr.xcomp_bv;
+
+   while (valid) {
+   u64 feature = valid & -valid;
+   int index = fls64(feature) - 1;
+   void *src = get_xsave_addr(xsave, feature);
+
+   if (src) {
+   u32 size, offset, ecx, edx;
+   cpuid_count(XSTATE_CPUID, index,
+   &size, &offset, &ecx, &edx);
+   memcpy(dest + offset, src, size);
+   }
+
+   valid -= feature;
+   }
+}
+
 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 struct kvm_xsave *guest_xsave)
 {
if (cpu_has_xsave) {
-   memcpy(guest_xsave->region,
-   &vcpu->arch.guest_fpu.state->xsave,
-   vcpu->arch.guest_xstate_size);
-   *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] &=
-   vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE;
+   memset(guest_xsave, 0, sizeof(struct kvm_xsave));
+   fill_xsave((u8 *) guest_xsave->region, vcpu);
} else {
memcpy(guest_xsave->region,
&vcpu->arch.guest_fpu.state->fxsave,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] arm/arm64: KVM: pull request for 3.18-rc6

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 18:21, Marc Zyngier wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-3.18-rc6

Pulled, thanks.  I'm not sure I'll be able to forward the request to
Linus in time, though.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 88671] BUG at drivers/pci/ats.c:62! the second time booting a kvm guest with pci passthrough

2014-11-21 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=88671

--- Comment #1 from Tom Stellard  ---
Created attachment 158391
  --> https://bugzilla.kernel.org/attachment.cgi?id=158391&action=edit
Backtrace from BUG_ON

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 88671] New: BUG at drivers/pci/ats.c:62! the second time booting a kvm guest with pci passthrough

2014-11-21 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=88671

Bug ID: 88671
   Summary: BUG at drivers/pci/ats.c:62! the second time booting a
kvm guest with pci passthrough
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.17.3
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: tstel...@gmail.com
Regression: No

Created attachment 158381
  --> https://bugzilla.kernel.org/attachment.cgi?id=158381&action=edit
lspci

I'm running into this bug while trying to use pci passthrough of an AMD BONAIRE
XT (Radeon HD 7790)

Steps to reproduce:

1. virsh start vm
2. virsh destroy vm
3. virsh start vm

This bug only appears after starting the vm for the second time.  The first
time the vm boots normally and passthrough works as expected.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 88671] BUG at drivers/pci/ats.c:62! the second time booting a kvm guest with pci passthrough

2014-11-21 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=88671

--- Comment #2 from Tom Stellard  ---
Created attachment 158401
  --> https://bugzilla.kernel.org/attachment.cgi?id=158401&action=edit
Virtual machine definition

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM causes #GP on XRSTORS

2014-11-21 Thread Paolo Bonzini

On 21/11/2014 15:46, Paolo Bonzini wrote:
> Fenghua, which processors have XSAVEC, which have XGETBV with ECX=1, and
> which have XSAVES?  We need to expose this in QEMU, for which I can send
> a patch later today or next week (CCing Eduardo for this).

Actually no change in QEMU is needed to hide XSAVES; the KVM patch I
just sent should be enough for "-cpu host" to work.

We still need the information on processor support though, in order to
enable the feature with the right -cpu options.  I assume Nadav was
using "-cpu host".

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 88671] BUG at drivers/pci/ats.c:62! the second time booting a kvm guest with pci passthrough

2014-11-21 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=88671

--- Comment #3 from Tom Stellard  ---
I should also mention that I have this hook executing when the machine starts
up:

if [ "$2" = "prepare" ]; then
virsh nodedev-detach pci__01_00_1
fi

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Paolo Bonzini



On 10/11/2014 09:33, Ard Biesheuvel wrote:
> This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
> kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
> 
> The problem being addressed by the patch above was that some ARM code
> based the memory mapping attributes of a pfn on the return value of
> kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
> be mapped as device memory.
> 
> However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
> and the existing non-ARM users were already using it in a way which
> suggests that its name should probably have been 'kvm_is_reserved_pfn'
> from the beginning, e.g., whether or not to call get_page/put_page on
> it etc. This means that returning false for the zero page is a mistake
> and the patch above should be reverted.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/ia64/kvm/kvm-ia64.c |  2 +-
>  arch/x86/kvm/mmu.c   |  6 +++---
>  include/linux/kvm_host.h |  2 +-
>  virt/kvm/kvm_main.c  | 16 
>  4 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index ec6b9acb6bea..dbe46f43884d 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  
>   for (i = 0; i < npages; i++) {
>   pfn = gfn_to_pfn(kvm, base_gfn + i);
> - if (!kvm_is_mmio_pfn(pfn)) {
> + if (!kvm_is_reserved_pfn(pfn)) {
>   kvm_set_pmt_entry(kvm, base_gfn + i,
>   pfn << PAGE_SHIFT,
>   _PAGE_AR_RWX | _PAGE_MA_WB);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ac1c4de3a484..978f402006ee 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>* kvm mmu, before reclaiming the page, we should
>* unmap it from mmu first.
>*/
> - WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
> + WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>  
>   if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
>   kvm_set_pfn_accessed(pfn);
> @@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   spte |= PT_PAGE_SIZE_MASK;
>   if (tdp_enabled)
>   spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> - kvm_is_mmio_pfn(pfn));
> + kvm_is_reserved_pfn(pfn));
>  
>   if (host_writable)
>   spte |= SPTE_HOST_WRITEABLE;
> @@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
> *vcpu,
>* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
>* here.
>*/
> - if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
> + if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
>   level == PT_PAGE_TABLE_LEVEL &&
>   PageTransCompound(pfn_to_page(pfn)) &&
>   !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04993f2..a6059bdf7b03 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  
> -bool kvm_is_mmio_pfn(pfn_t pfn);
> +bool kvm_is_reserved_pfn(pfn_t pfn);
>  
>  struct kvm_irq_ack_notifier {
>   struct hlist_node link;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 25ffac9e947d..3cee7b167052 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
>  
>  static bool largepages_enabled = true;
>  
> -bool kvm_is_mmio_pfn(pfn_t pfn)
> +bool kvm_is_reserved_pfn(pfn_t pfn)
>  {
>   if (pfn_valid(pfn))
> - return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
> + return PageReserved(pfn_to_page(pfn));
>  
>   return true;
>  }
> @@ -1321,7 +1321,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool 
> atomic, bool *async,
>   else if ((vma->vm_flags & VM_PFNMAP)) {
>   pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
>   vma->vm_pgoff;
> - BUG_ON(!kvm_is_mmio_pfn(pfn));
> + BUG_ON(!kvm_is_reserved_pfn(pfn));
>   } else {
>   if (async && vma_is_valid(vma, write_fault))
>   *async = true;
> @@ -1427,7 +1427,7 @@ static struct page *kvm_pfn_to_page(pfn_t pfn)
>   if (is_error_noslot_pfn(pfn))
>   return KVM_ERR_PTR_BAD_PAGE;
>  
> - if (kvm_is_mmio_pfn(pfn)) {
> + if (kvm_is_reserved_pfn(pfn)) {
>   WARN_ON(1);
>   return KVM_ERR_PTR_BAD_PAGE;
>   }
> @@ -1456,7

Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 14:18, Christoffer Dall wrote:
> On Fri, Nov 21, 2014 at 02:06:40PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 21/11/2014 12:46, Christoffer Dall wrote:
>>> Hi Paolo,
>>>
>>> I think these look good, would you mind queueing them as either a fix or
>>> for 3.19 as you see fit, assuming you agree with the content?
>>
>> Ah, I was thinking _you_ would queue them for 3.19.
>>
> We can do that, did I miss your previous ack or reviewed-by?

Since there's more stuff for 3.18 I can include these too.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[CFT PATCH 0/2] KVM: support XSAVES usage in the host

2014-11-21 Thread Paolo Bonzini

The first patch ensures that XSAVES is not exposed in the guest until
we emulate MSR_IA32_XSS.  The second exports XSAVE data in the correct
format.

I tested these on a non-XSAVES system so they should not be completely
broken, but I need some help.  I am not even sure which XSAVE states
are _not_ enabled, and thus compacted, in Linux.

Note that these patches do not add support for XSAVES in the guest yet,
since MSR_IA32_XSS is not emulated.

If they fix the bug Nadav reported, I'll add Reported-by and commit.

Thanks,

Paolo

Paolo Bonzini (2):
  kvm: x86: mask out XSAVES
  KVM: x86: support XSAVES usage in the host

 arch/x86/kvm/cpuid.c | 11 ++-
 arch/x86/kvm/x86.c   | 48 +++-
 2 files changed, 53 insertions(+), 6 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: next puzzle: Re: can I make this work… (Foundation for accessibility project)

2014-11-21 Thread Eric S. Johansson


a little more info

On 11/21/2014 01:24 PM, Eric S. Johansson wrote:

next puzzle. updates are not working
using bridged to eth0
using virt io driver (checked install on windows)
browser works in vm (quite well in fact)
watching output of tcpdump

and there is no apparent traffic for updates.


in resource manager, svchost.exe (netsvcs) is running at 100%

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 88671] BUG at drivers/pci/ats.c:62! the second time booting a kvm guest with pci passthrough

2014-11-21 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=88671

Alex Williamson  changed:

   What|Removed |Added

 CC||alex.william...@redhat.com

--- Comment #4 from Alex Williamson  ---
I'm not sure how you're getting to this BUG_ON, but (a) legacy KVM device
assignment is deprecated and (b) the card you've chosen has known reset issues.
 You might want to try vfio-pci, which I know can make this card work at least
once per host boot, but you're likely to get a BSOD and IOMMU faults on
subsequent guest [re]boots.  The reset problem with this card has been reported
to AMD, but there is no solution at this time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-21 Thread Alex Williamson

On Fri, 2014-11-21 at 06:06 +, Wu, Feng wrote:
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Thursday, November 20, 2014 11:54 PM
> > To: Wu, Feng
> > Cc: pbonz...@redhat.com; kvm@vger.kernel.org; eric.auger
> > Subject: Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
> > 
> > On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
> > > Add new group KVM_DEV_VFIO_INTERRUPT and command
> > > KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.
> > >
> > > This is used for VT-d Posted-Interrupts setup.
> > 
> > Eric proposed an interface for ARM forwarded interrupts[1] using group
> > KVM_DEV_VFIO_DEVICE with attributes
> > KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and
> > KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ.  Why are we proposing yet another
> > group and attributes here?  Why can't we re-use the ones Eric proposes?
> > 
> 
> I totally agree that I can reuse Eric's proposals. However, as Eric mentioned 
> in
> his reply, I am using another data structure. So how about adding my own
> attribute, say, KVM_DEV_VFIO_DEVICE_POSTING_IRQ in group KVM_DEV_VFIO_DEVICE.

Right, Eric's latest proposal (sorry I picked the v1 links by mistake in
my previous reply) includes:

KVM_DEV_VFIO_DEVICE attributes:
  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ

So I think we'd want to add something similar for posted interrupts,
some sort of "start" and "stop" attribute.  At the QEMU level we'll want
to abstract both of these as opportunistic IRQ accelerators, but at the
KVM-VFIO level we probably need to make them distinct using a separate
set of attributes.  Who knows, maybe one day ARM will support posted
interrupts and Intel will support forwarding...  I expect the calls from
the KVM-VFIO device into VFIO at the kernel level to be largely the same
between the different attributes though.  Thanks,

Alex

> > [1] https://lkml.org/lkml/2014/8/25/258
> > 
> > > Signed-off-by: Feng Wu 
> > > ---
> > >  Documentation/virtual/kvm/devices/vfio.txt |8 
> > >  include/uapi/linux/kvm.h   |   14 ++
> > >  2 files changed, 22 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/devices/vfio.txt
> > b/Documentation/virtual/kvm/devices/vfio.txt
> > > index ef51740..bd99176 100644
> > > --- a/Documentation/virtual/kvm/devices/vfio.txt
> > > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > > @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
> > >
> > >  Groups:
> > >KVM_DEV_VFIO_GROUP
> > > +  KVM_DEV_VFIO_INTERRUPT
> > >
> > >  KVM_DEV_VFIO_GROUP attributes:
> > >KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> > tracking
> > > @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
> > >
> > >  For each, kvm_device_attr.addr points to an int32_t file descriptor
> > >  for the VFIO group.
> > > +
> > > +KVM_DEV_VFIO_INTERRUPT attributes:
> > > +  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt
> > configuration for
> > > +VT-d Posted-Interrrupts
> > > +
> > > +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
> > > +include the needed information for VT-d Posted-Interrupts setup.
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 6076882..5544fcc 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -943,9 +943,23 @@ struct kvm_device_attr {
> > >   __u64   addr;   /* userspace address of attr data */
> > >  };
> > >
> > > +struct virq_info {
> > > + __u32   index;  /* index of the msi/msix entry */
> > > + int virq;   /* virq of the interrupt */
> > > +};
> > > +
> > > +struct kvm_posted_intr {
> > > + __u32   fd; /* file descriptor of the VFIO device */
> > > + __u32   count;
> > > + boolmsix;
> > 
> > Note that MSI-X (as opposed to MSI) is a PCI concept.  Being a VFIO
> > interface this should operate at VFIO IRQ index and sub-index.
> 
> Yes, I will use VFIO stuff instead.
> 
> Thanks,
> Feng
> 
> > 
> > > + struct virq_info virq_info[0];
> > > +};
> > > +
> > >  #define  KVM_DEV_VFIO_GROUP  1
> > >  #define   KVM_DEV_VFIO_GROUP_ADD 1
> > >  #define   KVM_DEV_VFIO_GROUP_DEL 2
> > > +#define  KVM_DEV_VFIO_INTERRUPT  2
> > > +#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1
> > >
> > >  enum kvm_device_type {
> > >   KVM_DEV_TYPE_FSL_MPIC_20= 1,
> > 
> > 
> 



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CFT PATCH 2/2] KVM: x86: support XSAVES usage in the host

2014-11-21 Thread Andy Lutomirski

On 11/21/2014 10:31 AM, Paolo Bonzini wrote:
> Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
> struct xsave_struct might be using the compacted format.  Convert
> in order to preserve userspace ABI.
> 
> Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324
> Cc: Fenghua Yu 
> Cc: sta...@vger.kernel.org
> Cc: Nadav Amit 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/x86.c | 48 +++-
>  1 file changed, 43 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5337039427c8..7e8a20e5615a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3131,15 +3131,53 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
> kvm_vcpu *vcpu,
>   return 0;
>  }
>  
> +#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
> +
> +static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
> +{
> + struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
> + u64 xstate_bv = vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE;
> + u64 valid;
> +
> + /*
> +  * Copy legacy XSAVE area, to avoid complications with CPUID
> +  * leaves 0 and 1 in the loop below.
> +  */
> + memcpy(dest, xsave, XSAVE_HDR_OFFSET);
> +
> + /* Set XSTATE_BV */
> + *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
> +
> + /*
> +  * Copy each region from the possibly compacted offset to the
> +  * non-compacted offset.
> +  */
> + valid = xstate_bv & ~XSTATE_FPSSE;
> + if (xsave->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED)
> + valid &= xsave->xsave_hdr.xcomp_bv;
> +
> + while (valid) {
> + u64 feature = valid & -valid;
> + int index = fls64(feature) - 1;
> + void *src = get_xsave_addr(xsave, feature);
> +
> + if (src) {
> + u32 size, offset, ecx, edx;
> + cpuid_count(XSTATE_CPUID, index,
> + &size, &offset, &ecx, &edx);
> + memcpy(dest + offset, src, size);

Is this really the best way to do this?  cpuid is serializing, so this
is possibly *very* slow.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-21 Thread Andrea Arcangeli

Hi Peter,

On Wed, Oct 29, 2014 at 05:56:59PM +, Peter Maydell wrote:
> On 29 October 2014 17:46, Andrea Arcangeli  wrote:
> > After some chat during the KVMForum I've been already thinking it
> > could be beneficial for some usage to give userland the information
> > about the fault being read or write
> 
> ...I wonder if that would let us replace the current nasty
> mess we use in linux-user to detect read vs write faults
> (which uses a bunch of architecture-specific hacks including
> in some cases "look at the insn that triggered this SEGV and
> decode it to see if it was a load or a store"; see the
> various cpu_signal_handler() implementations in user-exec.c).

There's currently no plan to deliver to userland read access
notifications of a present page, simply because the task of the
userfaultfd is to handle the page fault in userland, but if the page
is mapped and readable it won't fault in the first place :). I just
mean it's not like gdb read watch.

Even if the region would be set to PROT_NONE it would still SEGV
without triggering an userfault (after all pte_present would still
true because the page is still mapped despite not being readable, so
in any case it wouldn't be considered a not-present page fault).

If you temporarily remove the page (which requires an unavoidable TLB
flush also considering if the page was previously mapped the TLB could
still resolve it for reads) it would work then, because the plan is to
provide read/write fault information through the userfaultfd.

In theory it would be possible to deliver PROT_NONE faults through
userfault too but it doesn't make much sense because PROT_NONE still
requires a TLB flush, in addition to the vma
modifications/splitting/rbtree-rebalance and the mmap_sem for writing
as well.

Temporarily removing/moving the page with remap_anon_pages shall be
much better than using PROT_NONE for this (or alternative syscall name
to differentiate it further from remap_file_pages, or equivalent
userfaultfd command if we decide to hide the pte/pmd mangling as
userfaultfd commands instead of adding new standalone syscalls). It
would have the only constraint that you must mark the region
MADV_DONTFORK if you intend linux-user to ever fork or it won't work
reliably (that constraint is to eliminate the need of additional rmap
complexity, precisely so that it doesn't turn into something more
intrusive like remap_file_pages). I assume that would be a fine
constraint for linux-user.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: x86: move assigned-dev.c and iommu.c to arch/x86/

2014-11-21 Thread Radim Krčmář

Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier

Signed-off-by: Radim Krčmář 
---
 Or are we going to remove it instead? ;)

 arch/x86/include/asm/kvm_host.h   | 23 +++
 arch/x86/kvm/Makefile |  2 +-
 {virt => arch/x86}/kvm/assigned-dev.c |  0
 {virt => arch/x86}/kvm/iommu.c|  0
 arch/x86/kvm/x86.c|  2 +-
 include/linux/kvm_host.h  | 26 --
 virt/kvm/kvm_main.c   |  2 --
 7 files changed, 25 insertions(+), 30 deletions(-)
 rename {virt => arch/x86}/kvm/assigned-dev.c (100%)
 rename {virt => arch/x86}/kvm/iommu.c (100%)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 76ff3e2..d549cf8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1112,4 +1112,27 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned 
pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 
+#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
+int kvm_iommu_map_guest(struct kvm *kvm);
+int kvm_iommu_unmap_guest(struct kvm *kvm);
+
+long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
+ unsigned long arg);
+
+void kvm_free_all_assigned_devices(struct kvm *kvm);
+#else
+static inline int kvm_iommu_unmap_guest(struct kvm *kvm)
+{
+   return 0;
+}
+
+static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned 
ioctl,
+   unsigned long arg)
+{
+   return -ENOTTY;
+}
+
+static inline void kvm_free_all_assigned_devices(struct kvm *kvm) {}
+#endif
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index ee1cd92..08f790d 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,11 +9,11 @@ KVM := ../../../virt/kvm
 
 kvm-y  += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
-kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
+kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= assigned-dev.o iommu.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/virt/kvm/assigned-dev.c b/arch/x86/kvm/assigned-dev.c
similarity index 100%
rename from virt/kvm/assigned-dev.c
rename to arch/x86/kvm/assigned-dev.c
diff --git a/virt/kvm/iommu.c b/arch/x86/kvm/iommu.c
similarity index 100%
rename from virt/kvm/iommu.c
rename to arch/x86/kvm/iommu.c
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5337039..782e4ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4007,7 +4007,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
 
default:
-   ;
+   r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg);
}
 out:
return r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2d4270..746e3ef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -764,8 +764,6 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
irq_source_id);
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_iommu_unmap_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
-int kvm_iommu_map_guest(struct kvm *kvm);
-int kvm_iommu_unmap_guest(struct kvm *kvm);
 int kvm_assign_device(struct kvm *kvm,
  struct kvm_assigned_dev_kernel *assigned_dev);
 int kvm_deassign_device(struct kvm *kvm,
@@ -781,11 +779,6 @@ static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
 struct kvm_memory_slot *slot)
 {
 }
-
-static inline int kvm_iommu_unmap_guest(struct kvm *kvm)
-{
-   return 0;
-}
 #endif
 
 static inline void kvm_guest_enter(void)
@@ -1005,25 +998,6 @@ static inline bool kvm_vcpu_compatible(struct kvm_vcpu 
*vcpu) { return true; }
 
 #endif
 
-#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
-
-long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
- unsigned long arg);
-
-void kvm_free_all_assigned_devices(struct kvm *kvm);
-
-#else
-
-static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned 
ioctl,
-   unsigned long arg)
-{
-   return -ENOTTY;
-}
-
-static inline void kvm_free_all_assigned_devices(struct kvm *kvm) {}
-
-#endif
-
 static inl

Re: [CFT PATCH 2/2] KVM: x86: support XSAVES usage in the host

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 21:06, Andy Lutomirski wrote:
>> > +  cpuid_count(XSTATE_CPUID, index,
>> > +  &size, &offset, &ecx, &edx);
>> > +  memcpy(dest + offset, src, size);
> Is this really the best way to do this?  cpuid is serializing, so this
> is possibly *very* slow.

The data is in arch/x86/kernel/xsave.c, but it is not exported.  But
this is absolutely not a hotspot.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock

2014-11-21 Thread Paolo Bonzini



On 21/11/2014 07:30, Wanpeng Li wrote:
> I test it on the other guy's Ivytown and take advantage of the qemu command 
> line which he used, so I forget the accurate command line which used that day.
> 
> Paolo also reproduce the bug, Paolo, ping.

It also reproduced always for me with a debug kernel from Fedora.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing

2014-11-21 Thread Paolo Bonzini

On 21/11/2014 15:20, Zhang, Yang Z wrote:
> Zhang Haoyu wrote on 2014-11-20:
>> Hi all,
>>
>> If I press the one of "Insert/Delete/Home/End/PageUp/PageDown/UpArrow/
>> DownArrow/LeftArrow/RightArrow" key w/o releasing, then lots of
>> interrupts will be injected to vm(win7/win2008), about 8000/s, the
>> system become very slow, bringing very bad experience. But the other keys 
>> are okay.
>> And, linux guest has no this problem.
>>
>> If I remove the commit of 0bc830b05c667218d703f2026ec866c49df974fc, then
>> the problem disappeared, but win7 guest got stuck at booting stage. And
>> so strange that If the vm has only one vcpu, then the problem also
>> disappeared.
>>
>> Any ideas?
> 
> It looks commit 0bc830 doesn't do the right thing. The right point
> to clear an edge triggered interrupt in ioapic->irr is after userspace
> changes the irq line status. Otherwise, there may cause interrupt storm
> if a device sets the irq line in a fix edge continuously. 
>
> See below code:
> ioapic_set_irq:
>   .
> old_irr = ioapic->irr;
> ioapic->irr |= mask;
> if ((edge && old_irr == ioapic->irr) ||
> (!edge && entry.fields.remote_irr)) { 
> ret = 0;   // normally, 
> we should break from here. But we never go to here due to (edge && old_irr != 
> ioapic->irr) now.
> goto out;
> }   

The IRR register means an interrupt was received and not serviced yet,
similar to the LAPIC or PIC register.  It is not the same thing as the
interrupt line level (it happens to be for level-triggered interrupts).

We observed lost interrupts during migration, and fixing the semantics
of IRR was necessary in order to reinject those properly (commit
673f7b4257).  If QEMU sends KVM_IRQ_LINE twice with level=1 it should be
fixed---it is not supposed to do so.  Commit 0bc830b05 makes the kernel
IOAPIC behave the same way as QEMU's.

If you want the old semantics of KVM_IRQ_LINE, that requires a separate
register, different from IRR but it is not easy because they were buggy:
the level of the interrupt is not part of the IOAPIC state structs in
KVM, and it is not migrated in QEMU either.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-21 Thread Peter Maydell

On 21 November 2014 20:14, Andrea Arcangeli  wrote:
> Hi Peter,
>
> On Wed, Oct 29, 2014 at 05:56:59PM +, Peter Maydell wrote:
>> On 29 October 2014 17:46, Andrea Arcangeli  wrote:
>> > After some chat during the KVMForum I've been already thinking it
>> > could be beneficial for some usage to give userland the information
>> > about the fault being read or write
>>
>> ...I wonder if that would let us replace the current nasty
>> mess we use in linux-user to detect read vs write faults
>> (which uses a bunch of architecture-specific hacks including
>> in some cases "look at the insn that triggered this SEGV and
>> decode it to see if it was a load or a store"; see the
>> various cpu_signal_handler() implementations in user-exec.c).
>
> There's currently no plan to deliver to userland read access
> notifications of a present page, simply because the task of the
> userfaultfd is to handle the page fault in userland, but if the page
> is mapped and readable it won't fault in the first place :). I just
> mean it's not like gdb read watch.

If it's mapped and readable-but-not-writable then it should still
fault on write accesses, though? These are cases we currently get
SEGV for, anyway.

> Even if the region would be set to PROT_NONE it would still SEGV
> without triggering an userfault (after all pte_present would still
> true because the page is still mapped despite not being readable, so
> in any case it wouldn't be considered a not-present page fault).

Ah, I guess we have a terminology difference. I was considering
"page fault" to mean (roughly) "anything that causes the CPU to
take an exception on an attempted load/store" and expected that
userfaultfd would notify userspace of any of those. (Well, not
alignment faults, maybe, but I'm definitely surprised that
access permission issues don't get reported the same way as
page-completely-missing issues. In other words I was expecting
that this was "everything previously reported via SIGSEGV or
SIGBUS now comes via userfaultfd".)

> Temporarily removing/moving the page with remap_anon_pages shall be
> much better than using PROT_NONE for this (or alternative syscall name
> to differentiate it further from remap_file_pages, or equivalent
> userfaultfd command if we decide to hide the pte/pmd mangling as
> userfaultfd commands instead of adding new standalone syscalls).

We don't use PROT_NONE for the linux-user situation, we just use
mprotect() to remove the PAGE_WRITE permission so it's still
readable.

I suspect actually linux-user would be better off implementing
something like "if this is a page which we've mapped read-only
because we translated code out of it, then go ahead and remap
it r/w and throw away the translation and retry the access,
otherwise report SEGV to the guest", because taking SEGVs shouldn't
be a fast path in the guest binary. That would let us work without
architecture-specific junk and without requiring new kernel
features either. So you can ignore this whole tangent thread :-)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] arm/arm64: Enable Dirty Page logging for ARMv8 move log read, tlb flush to generic code

2014-11-21 Thread Mario Smarduch

On 11/21/2014 02:09 AM, Christoffer Dall wrote:
> On Wed, Nov 19, 2014 at 12:15:55PM -0800, Mario Smarduch wrote:
>> On 11/19/2014 06:39 AM, Christoffer Dall wrote:
>>> Hi Mario,
>>>
>>> On Fri, Nov 07, 2014 at 12:51:39PM -0800, Mario Smarduch wrote:
 On 11/07/2014 12:20 PM, Christoffer Dall wrote:
> On Thu, Oct 09, 2014 at 07:34:07PM -0700, Mario Smarduch wrote:
>> This patch enables ARMv8 dirty page logging and unifies ARMv7/ARMv8 code.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h | 12 
>>  arch/arm/kvm/arm.c  |  9 -
>>  arch/arm/kvm/mmu.c  | 17 +++--
>>  arch/arm64/kvm/Kconfig  |  2 +-
>>  4 files changed, 12 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 12311a5..59565f5 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -220,18 +220,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
>> boot_pgd_ptr,
>>  kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>>  }
>>  
>> -/**
>> - * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> - * @kvm:   pointer to kvm structure.
>> - *
>> - * Interface to HYP function to flush all VM TLB entries without address
>> - * parameter.
>> - */
>> -static inline void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> -{
>> -kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>> -}
>> -
>>  static inline int kvm_arch_dev_ioctl_check_extension(long ext)
>>  {
>>  return 0;
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 0546fa3..6a6fd6b 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,7 +242,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> const struct kvm_memory_slot *old,
>> enum kvm_mr_change change)
>>  {
>> -#ifdef CONFIG_ARM
>>  /*
>>   * At this point memslot has been committed and there is an
>>   * allocated dirty_bitmap[], dirty pages will be be tracked 
>> while the
>> @@ -250,7 +249,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>   */
>>  if ((change != KVM_MR_DELETE) && (mem->flags & 
>> KVM_MEM_LOG_DIRTY_PAGES))
>>  kvm_mmu_wp_memory_region(kvm, mem->slot);
>> -#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> @@ -783,13 +781,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  }
>>  }
>>  
>> -#ifdef CONFIG_ARM64
>> -int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
>> kvm_dirty_log *log)
>> -{
>> -return -EINVAL;
>> -}
>> -#endif
>> -
>>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>  struct kvm_arm_device_addr 
>> *dev_addr)
>>  {
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index df1a5a3..8c0f9f2 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -49,11 +49,18 @@ static phys_addr_t hyp_idmap_vector;
>>  
>>  static bool kvm_get_logging_state(struct kvm_memory_slot *memslot)
>>  {
>> -#ifdef CONFIG_ARM
>>  return !!memslot->dirty_bitmap;
>> -#else
>> -return false;
>> -#endif
>> +}
>> +
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries for ARMv7/8
>> + * @kvm:pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries
>> + */
>> +inline void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>>  }
>>  
>>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>> @@ -769,7 +776,6 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
>> phys_addr_t *ipap)
>>  return false;
>>  }
>>  
>> -#ifdef CONFIG_ARM
>>  /**
>>   * stage2_wp_ptes - write protect PMD range
>>   * @pmd:pointer to pmd entry
>> @@ -917,7 +923,6 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>  
>>  stage2_wp_range(kvm, start, end);
>>  }
>> -#endif
>>  
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>struct kvm_memory_slot *memslot,
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 40a8d19..a1a35809 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -26,7 +26,7 @@ config KVM
>>  select KVM_ARM_HOST
>>  select KVM_ARM_VGI

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-21 Thread Mario Smarduch

On 11/21/2014 03:19 AM, Christoffer Dall wrote:
> Hi Mario,
> 
> On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote:
>> Hi Laszlo,
>>
>> couple observations.
>>
>>  I'm wondering if access from qemu and guest won't
>> result in mixed memory attributes and if that's acceptable
>> to the CPU.
>>
>> Also is if you update memory from qemu you may break
>> dirty page logging/migration. Unless there is some other way
>> you keep track. Of course it may not be applicable in your
>> case (i.e. flash unused after boot).
>>
> I'm not concerned about this particular case; dirty page logging exists
> so KVM can inform userspace when a page may have been dirtied.  If
> userspace directly dirties (is that a verb?) a page, 
I would think so, I rely on software too much :)
> then it already knows that it needs to migrate that page and 
> deal with it accordingly.
> 
> Or did I miss some more subtle point here

QEMU has a global migration bitmap for all regions initially set
dirty, and it's updated over iterations with KVM's dirty bitmap. Once
dirty pages are migrated bits are cleared. If QEMU updates a
memory region directly I can't see how it's reflected in  that migration
bitmap that determines what pages should be migrated as it makes
it's passes. On x86 if host updates guest memory it marks that
page dirty.

But virtio writes to guest memory directly and that appears to
work just fine. I read that code sometime back, and will need to revisit.

- Mario

> 
> -Christoffer
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: nVMX: nested MSR auto load/restore emulation.

2014-11-21 Thread Wincy Van

Some hypervisors need MSR auto load/restore feature.

We read MSRs from vm-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writting
them to L1`s MSR store area. After this, we read MSRs from vm-exit
MSR load area, and load them via kvm_set_msr.

VirtualBox will work fine with this patch.

Signed-off-by: Wincy Van 

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 990a2fe..986af3f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -56,6 +56,7 @@
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_INVALID_STATE   33
+#define EXIT_REASON_MSR_LOAD_FAIL   34
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
 #define EXIT_REASON_MONITOR_INSTRUCTION 39
 #define EXIT_REASON_PAUSE_INSTRUCTION   40
@@ -114,8 +115,12 @@
  { EXIT_REASON_APIC_WRITE,"APIC_WRITE" }, \
  { EXIT_REASON_EOI_INDUCED,   "EOI_INDUCED" }, \
  { EXIT_REASON_INVALID_STATE, "INVALID_STATE" }, \
+ { EXIT_REASON_MSR_LOAD_FAIL, "MSR_LOAD_FAIL" }, \
  { EXIT_REASON_INVD,  "INVD" }, \
  { EXIT_REASON_INVVPID,   "INVVPID" }, \
  { EXIT_REASON_INVPCID,   "INVPCID" }

+#define VMX_ABORT_SAVE_GUEST_MSR_FAIL1
+#define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
+
 #endif /* _UAPIVMX_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a951d8..377e405 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6088,6 +6088,13 @@ static void nested_vmx_failValid(struct kvm_vcpu *vcpu,
  */
 }

+static void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
+{
+ /* TODO: not to simply reset guest here. */
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ printk(KERN_WARNING"kvm: nested vmx abort, indicator %d\n", indicator);
+}
+
 static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer)
 {
  struct vcpu_vmx *vmx =
@@ -8215,6 +8222,88 @@ static void vmx_start_preemption_timer(struct
kvm_vcpu *vcpu)
   ns_to_ktime(preemption_timeout), HRTIMER_MODE_REL);
 }

+static inline int nested_msr_check_common(struct vmx_msr_entry *e)
+{
+ if (e->index >> 8 == 0x8 || e->reserved != 0)
+ return -EINVAL;
+return 0;
+}
+
+static inline int nested_load_msr_check(struct vmx_msr_entry *e)
+{
+ if (e->index == MSR_FS_BASE ||
+e->index == MSR_GS_BASE ||
+nested_msr_check_common(e))
+ return -EINVAL;
+ return 0;
+}
+
+/* load guest msr at nested entry.
+ * return 0 for success, entry index for failed.
+ */
+static u32 nested_entry_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+{
+ u32 i = 0;
+ struct vmx_msr_entry e;
+ struct msr_data msr;
+
+ msr.host_initiated = false;
+ while (i < count) {
+ kvm_read_guest(vcpu->kvm, gpa + i * sizeof(struct vmx_msr_entry),
+ &e, sizeof(struct vmx_msr_entry));
+ if (nested_load_msr_check(&e))
+ goto fail;
+ msr.index = e.index;
+ msr.data = e.value;
+ if (kvm_set_msr(vcpu, &msr))
+ goto fail;
+ ++i;
+}
+ return 0;
+fail:
+ return i + 1;
+}
+
+static int nested_exit_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+{
+ u32 i = 0;
+ struct vmx_msr_entry e;
+
+while (i < count) {
+ kvm_read_guest(vcpu->kvm, gpa + i * sizeof(struct vmx_msr_entry),
+ &e, sizeof(struct vmx_msr_entry));
+ if (nested_msr_check_common(&e))
+ return -EINVAL;
+ if (kvm_get_msr(vcpu, e.index, &e.value))
+ return -EINVAL;
+ kvm_write_guest(vcpu->kvm, gpa + i * sizeof(struct vmx_msr_entry),
+ &e, sizeof(struct vmx_msr_entry));
+ ++i;
+ }
+ return 0;
+}
+
+static int nested_exit_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+{
+ u32 i = 0;
+ struct vmx_msr_entry e;
+ struct msr_data msr;
+
+ msr.host_initiated = false;
+ while (i < count) {
+ kvm_read_guest(vcpu->kvm, gpa + i * sizeof(struct vmx_msr_entry),
+ &e, sizeof(struct vmx_msr_entry));
+ if (nested_load_msr_check(&e))
+ return -EINVAL;
+ msr.index = e.index;
+ msr.data = e.value;
+ if (kvm_set_msr(vcpu, &msr))
+ return -EINVAL;
+ ++i;
+ }
+ return 0;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -8509,6 +8598,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu,
bool launch)
  int cpu;
  struct loaded_vmcs *vmcs02;
  bool ia32e;
+ u32 msr_entry_idx;

  if (!nested_vmx_check_permission(vcpu) ||
 !nested_vmx_check_vmcs12(vcpu))
@@ -8556,11 +8646,12 @@ static int nested_vmx_run(struct kvm_vcpu
*vcpu, bool launch)
  return 1;
  }

- if (vmcs12->vm_entry_msr_load_count > 0 ||
-vmcs12->vm_exit_msr_load_count > 0 ||
-vmcs12->vm_exit_msr_store_count > 0) {
- pr_warn_ratelimited("%s: VMCS MSR_{LOAD,STORE} unsupported\n",
-__func__);
+ if ((vmcs12->vm_entry_msr_load_count > 0 &&
+ !IS_ALIGNED(vmcs12->vm_entry_msr_load_addr, 16)) ||
+(vmcs12->vm_exit_msr_load_count > 0 &&
+ !IS_ALIGNED(vmcs12->vm_exit_msr_load

Re: [PATCH] KVM: nVMX: nested MSR auto load/restore emulation.

2014-11-21 Thread Jan Kiszka

On 2014-11-22 05:24, Wincy Van wrote:
> Some hypervisors need MSR auto load/restore feature.
> 
> We read MSRs from vm-entry MSR load area which specified by L1,
> and load them via kvm_set_msr in the nested entry.
> When nested exit occurs, we get MSRs via kvm_get_msr, writting
> them to L1`s MSR store area. After this, we read MSRs from vm-exit
> MSR load area, and load them via kvm_set_msr.
> 
> VirtualBox will work fine with this patch.

Cool! This feature is long overdue.

Patch is unfortunately misformatted which makes it very hard to read.
Please check via linux/scripts/checkpatch.pl for the proper style.

Could you also write a corresponding kvm-unit-test (see x86/vmx_tests.c)?

Jan

signature.asc
Description: OpenPGP digital signature

53 matches

Mail list logo