Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-31 Thread David Hildenbrand
> > We have
> > - wait (wait bit in PSW)
> > - disabled wait (wait bit and interrupt fencing in PSW)
> > - STOPPED (not related to PSW, state change usually handled via service 
> > processor or hypervisor)
> >
> > I think we have to differentiate between KVM/TCG. On KVM we always do in 
> > kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG 
> > has to take care of the normal wait as well.
> >
> >  From a first glimpse, a disabled wait and STOPPED look similar, but there 
> > are (important) differences, e.g. other CPUs get a different a different 
> > result from a SIGP SENSE. This makes a big difference, e.g. for Linux 
> > guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU 
> > is down on hotplug (and shutdown, kexec..) So I think we agree, that 
> > handling the cpu states natively makes sense.
> >
> > The question is now only how to model it correctly without breaking TCG/KVM 
> > and reuse as much common code as possible. Correct?
> >
> > Do I understand you correctly, that your collapsing of stopped and halted 
> > is only in the qemu coding sense, IOW maybe we could just modify 
> > kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
> > implementation does not support SMP anyway?
> 
> That works for me, yes.
> 
> 
> Alex
> 

I had a look at it yesterday and it seems like we can totally drop this patch:

1. TCG doesn't support multiple CPUs and the TCG SIGP implementation isn't
ready for proper STOP/START/SENSE. Testing for STOPPED cpus in cpu_has_work()
can be dropped. To be able to support TCG was the main reason for this patch -
as we don't want to do so for now, we can leave it as is. We can still decide
to support the cpu states later using a mechanism suggest by Alex
(interrupt_requests).

Even if cpu_has_work() would make cpu.c:cpu_thread_is_idle() return false,
kvm_arch_process_async_events() called by kvm-all.c:kvm_cpu_exec() would make
it go back to sleep. Therefore a stopped VCPU will never be able to run in the
KVM case (because it always has cs->halted = true).

2. The unhalt in kvm_arch_process_async_events is for a special case where a
VCPU is in disabled wait and receives e.g. a machine-check interrupt. These
might happen in the future, for now we will never see them (the only
way to get a vcpu out of disabled wait are SIGP RESTART/CPU RESET - so we
don't break anything at that point).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 2/2] KVM: s390: rework broken SIGP STOP interrupt handling

2014-07-31 Thread Christian Borntraeger
From: David Hildenbrand 

A VCPU might never stop if it intercepts (for whatever reason) between
"fake interrupt delivery" and execution of the stop function.

Heart of the problem is that SIGP STOP is an interrupt that has to be
processed on every SIE entry until the VCPU finally executes the stop
function.

This problem was made apparent by commit 7dfc63cf977447e09b1072911c2
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time).
With the old code, the guest could (incorrectly) inject SIGP STOPs
multiple times. The bug of losing a sigp stop exists in KVM before
7dfc63cf97, but it was hidden by Linux guests doing a sigp stop loop.
The new code (rightfully) returns CC=2 and does not queue a new
interrupt.

This patch is a simple fix of the problem. Longterm we are going to
rework that code - e.g. get rid of the action bits and so on.

Signed-off-by: David Hildenbrand 
Reviewed-by: Christian Borntraeger 
Acked-by: Cornelia Huck 
Signed-off-by: Christian Borntraeger 
[some additional patch description]
---
 arch/s390/kvm/interrupt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 1be3d8d..92528a0 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -158,6 +158,9 @@ static void __reset_intercept_indicators(struct kvm_vcpu 
*vcpu)
   LCTL_CR10 | LCTL_CR11);
vcpu->arch.sie_block->ictl |= (ICTL_STCTL | ICTL_PINT);
}
+
+   if (vcpu->arch.local_int.action_bits & ACTION_STOP_ON_STOP)
+   atomic_set_mask(CPUSTAT_STOP_INT, 
&vcpu->arch.sie_block->cpuflags);
 }
 
 static void __set_cpuflag(struct kvm_vcpu *vcpu, u32 flag)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 1/2] KVM: s390: Fix memory leak on busy SIGP stop

2014-07-31 Thread Christian Borntraeger
commit 7dfc63cf977447e09b1072911c22564f900fc578
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time)
introduced a memory leak if a sigp stop is already pending. Free
the allocated inti structure.

Signed-off-by: Christian Borntraeger 
Reviewed-by: David Hildenbrand 
---
 arch/s390/kvm/sigp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index c6f1c2b..cf243ba 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -139,6 +139,7 @@ static int __inject_sigp_stop(struct kvm_vcpu *dst_vcpu, 
int action)
spin_lock(&li->lock);
if (li->action_bits & ACTION_STOP_ON_STOP) {
/* another SIGP STOP is pending */
+   kfree(inti);
rc = SIGP_CC_BUSY;
goto out;
}
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Wanpeng Li
Hi Paolo,
On Wed, Jul 30, 2014 at 05:20:58PM +0200, Paolo Bonzini wrote:
>Il 30/07/2014 14:04, Wanpeng Li ha scritto:
>> @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
>> struct vmcs12 *vmcs12)
>>  if (!vmx->rdtscp_enabled)
>>  exec_control &= ~SECONDARY_EXEC_RDTSCP;
>>  /* Take the following fields only from vmcs12 */
>> -exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>> -  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>> +exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>SECONDARY_EXEC_APIC_REGISTER_VIRT);
>
>This change is wrong.  You don't have to take L0's "virtualize APIC
>accesses" setting into account, because while running L2 you cannot
>modify L1's CR8 (only the virtual nested one).
>

Agreed.

>> +
>> +virtual_apic_page = nested_get_page(vcpu,
>> +vmcs12->virtual_apic_page_addr);
>> +if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
>> +page_to_phys(virtual_apic_page))
>> +vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
>> +page_to_phys(virtual_apic_page));
>> +nested_release_page(virtual_apic_page);
>> +
>
>You cannot release this page here.  You need to the exactly the same
>thing that is done for apic_access_page.
>

Agreed.

>One thing:
>
>> +if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>> +vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>
>I think you can just do this write unconditionally, since most
>hypervisors will enable this.  Also, you probably can add the tpr

What will happen if a hypervisor doesn't enable it? I make it more 
cleaner in version two.

>threshold field to the read-write fields for shadow VMCS.
>

Agreed.

Regards,
Wanpeng Li 

>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 0/2] KVM: s390: Fixes for kvm/next (3.17)

2014-07-31 Thread Christian Borntraeger
Paolo,

The following changes since commit b55a8144d1807f9e74c51cb584f0dd198483d86c:

  x86/kvm: Resolve shadow warning from min macro (2014-07-25 16:05:54 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-20140730

for you to fetch changes up to db3738614767e1f2dfe69afca070d7bc46266cca:

  KVM: s390: rework broken SIGP STOP interrupt handling (2014-07-31 09:20:35 
+0200)


Two fixes for recently introduced regressions
- a memory leak on busy SIGP
- pontentially lost SIGP stop in rare situations (shutdown loops)

The first issue is not part of a released kernel. The 2nd issue is
present in all KVM versions, but did not trigger before commit
7dfc63cf977447e09b1072911c2 (KVM: s390: allow only one SIGP STOP
(AND STORE STATUS) at a time) with Linux as a guest.
So no need for cc stable


Christian Borntraeger (1):
  KVM: s390: Fix memory leak on busy SIGP stop

David Hildenbrand (1):
  KVM: s390: rework broken SIGP STOP interrupt handling

 arch/s390/kvm/interrupt.c | 3 +++
 arch/s390/kvm/sigp.c  | 1 +
 2 files changed, 4 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: PR: Handle FSCR feature deselects

2014-07-31 Thread Alexander Graf
We handle FSCR feature bits (well, TAR only really today) lazily when the guest
starts using them. So when a guest activates the bit and later uses that feature
we enable it for real in hardware.

However, when the guest stops using that bit we don't stop setting it in
hardware. That means we can potentially lose a trap that the guest expects to
happen because it thinks a feature is not active.

This patch adds support to drop TAR when then guest turns it off in FSCR. While
at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h | 1 +
 arch/powerpc/kvm/book3s_emulate.c | 6 +++---
 arch/powerpc/kvm/book3s_pr.c  | 9 +
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 6166791..6acf0c2 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -182,6 +182,7 @@ extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
 extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
+extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 84fddcd..5a2bc4b 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -449,10 +449,10 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong spr_val)
case SPRN_GQR7:
to_book3s(vcpu)->gqr[sprn - SPRN_GQR0] = spr_val;
break;
+#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_FSCR:
-   vcpu->arch.fscr = spr_val;
+   kvmppc_set_fscr(vcpu, spr_val);
break;
-#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_BESCR:
vcpu->arch.bescr = spr_val;
break;
@@ -593,10 +593,10 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_val
case SPRN_GQR7:
*spr_val = to_book3s(vcpu)->gqr[sprn - SPRN_GQR0];
break;
+#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_FSCR:
*spr_val = vcpu->arch.fscr;
break;
-#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_BESCR:
*spr_val = vcpu->arch.bescr;
break;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index e7a1fa2..faffb27 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -871,6 +871,15 @@ static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong 
fac)
 
return RESUME_GUEST;
 }
+
+void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr)
+{
+   if ((vcpu->arch.fscr & FSCR_TAR) && !(fscr & FSCR_TAR)) {
+   /* TAR got dropped, drop it in shadow too */
+   kvmppc_giveup_fac(vcpu, FSCR_TAR_LG);
+   }
+   vcpu->arch.fscr = fscr;
+}
 #endif
 
 int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>> One thing:
>>
>>> +   if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>>> +   vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>
>> I think you can just do this write unconditionally, since most
>> hypervisors will enable this.  Also, you probably can add the tpr
> 
> What will happen if a hypervisor doesn't enable it? I make it more 
> cleaner in version two.

TPR_THRESHOLD will be likely written as zero, but the processor will
never use it anyway.  It's just a small optimization because
nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Paolo

>> threshold field to the read-write fields for shadow VMCS.
> 
> Agreed.
> 
> Regards,
> Wanpeng Li 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Avoid using TSC clocksource on AMD APUs affected by erratum 778

2014-07-31 Thread Igor Mammedov
Fixes pvclock backwards jumps caused by TSC drifting despite
host believing that TSC is invariant/synchronized.

TSC drift maybe caused by erratum 778 described in
"Revision Guide for AMD Family 15h Models 10h-1Fh Processors,
 Publication # 48931, Issue Date: May 2013, Revision: 3.10"


Igor Mammedov (2):
  x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh
  x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG

 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/kernel/cpu/amd.c | 9 +
 arch/x86/kvm/cpuid.c  | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG

2014-07-31 Thread Igor Mammedov
Signed-off-by: Igor Mammedov 
---
 arch/x86/kvm/cpuid.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..f519823 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -478,8 +478,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_CLOCKSOURCE2) |
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
-(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
 (1 << KVM_FEATURE_PV_UNHALT);
+   if (!static_cpu_has_bug(X86_BUG_AMD_TSC_DRIFT))
+   entry->eax |= (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh

2014-07-31 Thread Igor Mammedov
Due to erratum #778 from
"Revision Guide for AMD Family 15h Models 10h-1Fh Processors,
 Publication # 48931, Issue Date: May 2013, Revision: 3.10"

TSC on affected processor, a core may drift under certain conditions,
which makes initially synchronized TSCs to become unsynchronized.

As result TSC clocksource becomes unsuitable for using as wallclock
and it brakes pvclock when it's running with PVCLOCK_TSC_STABLE_BIT
flag set.
That causes backwards clock jumps when pvclock is first read on
CPU with drifted TSC and then on CPU where TSC was stable or had
a lower drift rate.

To fix issue mark TSC as unstable on affected CPU, so it won't
be used as clocksource. Which in turn disables master_clock
mechanism in KVM and force pvclock using global clock counter
that can't go backwards.

Signed-off-by: Igor Mammedov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/kernel/cpu/amd.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index e265ff9..c47a2a77 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -236,6 +236,7 @@
 #define X86_BUG_COMA   X86_BUG(2) /* Cyrix 6x86 coma */
 #define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* AMD Erratum 383 */
 #define X86_BUG_AMD_APIC_C1E   X86_BUG(4) /* AMD Erratum 400 */
+#define X86_BUG_AMD_TSC_DRIFT  X86_BUG(5) /* AMD Erratum 778 */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ce8b8ff..5623eb8 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -513,6 +513,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 
 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
+static const int amd_erratum_778[];
 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
 
 static void init_amd(struct cpuinfo_x86 *c)
@@ -721,6 +722,11 @@ static void init_amd(struct cpuinfo_x86 *c)
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
+   if (cpu_has_amd_erratum(c, amd_erratum_778)) {
+   set_cpu_bug(c, X86_BUG_AMD_TSC_DRIFT);
+   mark_tsc_unstable("possible TSC drift as per erratum #778");
+   }
+
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
 }
 
@@ -857,6 +863,9 @@ static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
 
+static const int amd_erratum_778[] =
+   AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x15, 0x10, 0, 0x1f, 0xf));
+
 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
int osvw_id = *erratum++;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] KVM: perf/stat: Properly show submicrosecond times

2014-07-31 Thread Christian Borntraeger
For lots of exits the min time (and sometimes max) is 0 or 1. Lets
increase the accurancy similar to what the average field alread does.

Cc: Paolo Bonzini 
Cc: Jiri Olsa 
Cc: David Ahern 
Cc: Arnaldo Carvalho de Melo 

Signed-off-by: Christian Borntraeger 
---
 tools/perf/builtin-kvm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 43367eb..fe92dfd 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -592,8 +592,8 @@ static void print_result(struct perf_kvm_stat *kvm)
pr_info("%9s ", "Samples%");
 
pr_info("%9s ", "Time%");
-   pr_info("%10s ", "Min Time");
-   pr_info("%10s ", "Max Time");
+   pr_info("%11s ", "Min Time");
+   pr_info("%11s ", "Max Time");
pr_info("%16s ", "Avg time");
pr_info("\n\n");
 
@@ -610,8 +610,8 @@ static void print_result(struct perf_kvm_stat *kvm)
pr_info("%10llu ", (unsigned long long)ecount);
pr_info("%8.2f%% ", (double)ecount / kvm->total_count * 100);
pr_info("%8.2f%% ", (double)etime / kvm->total_time * 100);
-   pr_info("%8" PRIu64 "us ", min / 1000);
-   pr_info("%8" PRIu64 "us ", max / 1000);
+   pr_info("%9.2fus ", (double)min / 1e3);
+   pr_info("%9.2fus ", (double)max / 1e3);
pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
kvm_event_rel_stddev(vcpu, event));
pr_info("\n");
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] increase perf kvm stat accuracy

2014-07-31 Thread Christian Borntraeger
On my s390 kvm system most of the kvm exits are in the range
of 0 or 1 microseconds. Can we increase the accuracy by 2 
additional digits?

Opinions?


Christian Borntraeger (1):
  KVM: perf/stat: Properly show submicrosecond times

 tools/perf/builtin-kvm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang after seabios

2014-07-31 Thread Richard W.M. Jones
On Wed, Jul 30, 2014 at 03:58:43PM -0700, Zetan Drableg wrote:
>[00183ms] /usr/libexec/qemu-kvm \
>-global virtio-blk-pci.scsi=off \
>-nodefconfig \
>-nodefaults \
>-nographic \
>-machine accel=kvm:tcg \
>-cpu host,+kvmclock \
>-m 500 \
>-no-reboot \
>-kernel /var/tmp/.guestfs-0/kernel.47903 \
>-initrd /var/tmp/.guestfs-0/initrd.47903 \
>-device virtio-scsi-pci,id=scsi \
>-drive file=/tmp/libguestfs-test-tool-sda-Iakpwe,cache=none,format
>=raw,id=hd0,if=none \
>-device scsi-hd,drive=hd0 \
>-drive file=/var/tmp/.guestfs-0/root.47903,snapshot=on,id=appliance,
>if=none,cache=unsafe \
>-device scsi-hd,drive=appliance \
>-device virtio-serial \
>-serial stdio \
>-device sga \
>-chardev socket,path=/tmp/libguestfspx9994/guestfsd.sock,id=channel0
>\
>-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0
>\
>-append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check
>acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0
>guestfs_verbose=1 TERM=xterm edd=off'
>\x1b[1;256r\x1b[256;256H\x1b[6n
>Google, Inc.
>Serial Graphics Adapter 10/14/11
>SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@
>ca-build44.us.oracle.com) Fri Oct 14 20:04:36 UTC 2011
>Term: 80x24
>4 0
>SeaBIOS (version seabios-0.6.1.2-28.el6)
>\x1b[2J
> 
> At this point it hangs forever.

These hangs can be tricky to diagnose.

There are a couple of things you can do however:

(1) strace qemu to find out what it is doing, or:

(2) [harder, but much more informative] gdb into the guest to find out
where the guest hangs, or:

(3) Take the command line above, and cut it down to try to isolate the
problematic options.  I would concentrate on the following options as
being most likely to cause trouble:

 -cpu
 -machine
 -kernel
 -device sga (remove it)

For (1) and (2) you can use a qemu wrapper to modify the qemu command
that the test tool runs.

See also:

http://libguestfs.org/guestfs.3.html#qemu-wrappers

http://rwmj.wordpress.com/2011/10/12/tip-debugging-the-early-boot-process-with-qemu-and-gdb/#content

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] vhost: Add polling mode

2014-07-31 Thread Razya Ladelsky
Resubmitting the patch in: http://marc.info/?l=kvm&m=140594903520308&w=2
after fixing the whitespaces issues.
Thank you,
Razya

>From f293e470b36ff9eb4910540c620315c418e4a8fc Mon Sep 17 00:00:00 2001
From: Razya Ladelsky 
Date: Thu, 31 Jul 2014 09:47:20 +0300
Subject: [PATCH] vhost: Add polling mode

Add an optional polling mode to continuously poll the virtqueues
for new buffers, and avoid asking the guest to kick us.

Signed-off-by: Razya Ladelsky 
---
 drivers/vhost/net.c   |6 +-
 drivers/vhost/scsi.c  |6 +-
 drivers/vhost/vhost.c |  245 +++--
 drivers/vhost/vhost.h |   38 +++-
 4 files changed, 277 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 971a760..558aecb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-   vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
-   vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
+   vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
+   vqs[VHOST_NET_VQ_TX]);
+   vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
+   vqs[VHOST_NET_VQ_RX]);
 
f->private_data = n;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4f4ffa4..665eeeb 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1528,9 +1528,9 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
if (!vqs)
goto err_vqs;
 
-   vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
-   vhost_work_init(&vs->vs_event_work, tcm_vhost_evt_work);
-
+   vhost_work_init(&vs->vs_completion_work, NULL,
+   vhost_scsi_complete_cmd_work);
+   vhost_work_init(&vs->vs_event_work, NULL, tcm_vhost_evt_work);
vs->vs_events_nr = 0;
vs->vs_events_missed = false;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c90f437..fbe8174 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -24,9 +24,17 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "vhost.h"
+static int poll_start_rate = 0;
+module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of virtqueue when 
rate of events is at least this number per jiffy. If 0, never start polling.");
+
+static int poll_stop_idle = 3*HZ; /* 3 seconds */
+module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of virtqueue after 
this many jiffies of no work.");
 
 enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
@@ -58,27 +66,28 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
mode, int sync,
return 0;
 }
 
-void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, struct vhost_virtqueue *vq,
+   vhost_work_fn_t fn)
 {
INIT_LIST_HEAD(&work->node);
work->fn = fn;
init_waitqueue_head(&work->done);
work->flushing = 0;
work->queue_seq = work->done_seq = 0;
+   work->vq = vq;
 }
 EXPORT_SYMBOL_GPL(vhost_work_init);
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-unsigned long mask, struct vhost_dev *dev)
+unsigned long mask, struct vhost_virtqueue *vq)
 {
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
-   poll->dev = dev;
+   poll->dev = vq->dev;
poll->wqh = NULL;
-
-   vhost_work_init(&poll->work, fn);
+   vhost_work_init(&poll->work, vq, fn);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_init);
 
@@ -174,6 +183,86 @@ void vhost_poll_queue(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
+/* Enable or disable virtqueue polling (vqpoll.enabled) for a virtqueue.
+ *
+ * Enabling this mode it tells the guest not to notify ("kick") us when its
+ * has made more work available on this virtqueue; Rather, we will continuously
+ * poll this virtqueue in the worker thread. If multiple virtqueues are polled,
+ * the worker thread polls them all, e.g., in a round-robin fashion.
+ * Note that vqpoll.enabled doesn't always mean that this virtqueue is
+ * actually being polled: The backend (e.g., net.c) may temporarily disable it
+ * using vhost_disable/enable_notify(), while vqpoll.enabled is unchanged.
+ *
+ * It is assumed that these functions are called relatively rarely, when vhost
+ * notices that this virtqueue's usage pattern significantly changed in a way
+ * that makes polling more efficient than notifica

Re: [PATCH V2 3/4] x86/kvm: Resolve shadow warnings in macro expansion

2014-07-31 Thread Paolo Bonzini
Il 30/07/2014 23:19, Mark D Rustad ha scritto:
> Resolve shadow warnings that appear in W=2 builds. Instead of
> using ret to hold the return pointer, save the length in a new
> variable saved_len and compute the pointer on exit. This also
> resolves a very technical error, in that ret was declared as
> a const char *, when it really was a char * const, which
> theoretically could have allowed the compiler to do something
> wrong.
> 
> Signed-off-by: Mark Rustad 
> Signed-off-by: Jeff Kirsher 
> 
> ---
> Changes in V2:
> - Instead of renaming all inner variables, just delete the
>   ret variable in favor of the new saved_len variable.
> ---
>  arch/x86/kvm/mmutrace.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> index 9d2e0ffcb190..5aaf35641768 100644
> --- a/arch/x86/kvm/mmutrace.h
> +++ b/arch/x86/kvm/mmutrace.h
> @@ -22,7 +22,7 @@
>   __entry->unsync = sp->unsync;
>  
>  #define KVM_MMU_PAGE_PRINTK() ({ \
> - const char *ret = p->buffer + p->len;   \
> + const u32 saved_len = p->len;   \
>   static const char *access_str[] = { \
>   "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
>   };  \
> @@ -41,7 +41,7 @@
>role.nxe ? "" : "!",   \
>__entry->root_count,   \
>__entry->unsync ? "unsync" : "sync", 0);   \
> - ret;\
> + p->buffer + saved_len;  \
>   })
>  
>  #define kvm_mmu_trace_pferr_flags   \
> 

Applying this patch, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/4] x86/kvm: Resolve some missing-initializers warnings

2014-07-31 Thread Paolo Bonzini
Il 30/07/2014 23:18, Mark D Rustad ha scritto:
> Resolve some missing-initializers warnings that appear in W=2
> builds. They are resolved by adding the name as a parameter to
> the macros and having the macro generate all four fields of the
> structure.
> 
> Signed-off-by: Mark Rustad 
> Signed-off-by: Jeff Kirsher 
> 
> ---
> V2: Change macro to supply all four fields instead of using a
> designated initializer. Also fix up the array terminator.
> ---
>  arch/x86/kvm/x86.c |   71 
> ++--
>  1 file changed, 36 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ef432f891d30..623aea52ceba 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -82,8 +82,9 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | 
> EFER_LME | EFER_LMA));
>  static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>  #endif
>  
> -#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
> -#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
> +#define VM_STAT(name, x) name, offsetof(struct kvm, stat.x), KVM_STAT_VM, 
> NULL
> +#define VCPU_STAT(name, x) name, offsetof(struct kvm_vcpu, stat.x), \
> +KVM_STAT_VCPU, NULL
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>  static void process_nmi(struct kvm_vcpu *vcpu);
> @@ -128,39 +129,39 @@ static struct kvm_shared_msrs_global __read_mostly 
> shared_msrs_global;
>  static struct kvm_shared_msrs __percpu *shared_msrs;
>  
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
> - { "pf_fixed", VCPU_STAT(pf_fixed) },
> - { "pf_guest", VCPU_STAT(pf_guest) },
> - { "tlb_flush", VCPU_STAT(tlb_flush) },
> - { "invlpg", VCPU_STAT(invlpg) },
> - { "exits", VCPU_STAT(exits) },
> - { "io_exits", VCPU_STAT(io_exits) },
> - { "mmio_exits", VCPU_STAT(mmio_exits) },
> - { "signal_exits", VCPU_STAT(signal_exits) },
> - { "irq_window", VCPU_STAT(irq_window_exits) },
> - { "nmi_window", VCPU_STAT(nmi_window_exits) },
> - { "halt_exits", VCPU_STAT(halt_exits) },
> - { "halt_wakeup", VCPU_STAT(halt_wakeup) },
> - { "hypercalls", VCPU_STAT(hypercalls) },
> - { "request_irq", VCPU_STAT(request_irq_exits) },
> - { "irq_exits", VCPU_STAT(irq_exits) },
> - { "host_state_reload", VCPU_STAT(host_state_reload) },
> - { "efer_reload", VCPU_STAT(efer_reload) },
> - { "fpu_reload", VCPU_STAT(fpu_reload) },
> - { "insn_emulation", VCPU_STAT(insn_emulation) },
> - { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
> - { "irq_injections", VCPU_STAT(irq_injections) },
> - { "nmi_injections", VCPU_STAT(nmi_injections) },
> - { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
> - { "mmu_pte_write", VM_STAT(mmu_pte_write) },
> - { "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
> - { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) },
> - { "mmu_flooded", VM_STAT(mmu_flooded) },
> - { "mmu_recycled", VM_STAT(mmu_recycled) },
> - { "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
> - { "mmu_unsync", VM_STAT(mmu_unsync) },
> - { "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
> - { "largepages", VM_STAT(lpages) },
> - { NULL }
> + { VCPU_STAT("pf_fixed", pf_fixed) },
> + { VCPU_STAT("pf_guest", pf_guest) },
> + { VCPU_STAT("tlb_flush", tlb_flush) },
> + { VCPU_STAT("invlpg", invlpg) },
> + { VCPU_STAT("exits", exits) },
> + { VCPU_STAT("io_exits", io_exits) },
> + { VCPU_STAT("mmio_exits", mmio_exits) },
> + { VCPU_STAT("signal_exits", signal_exits) },
> + { VCPU_STAT("irq_window", irq_window_exits) },
> + { VCPU_STAT("nmi_window", nmi_window_exits) },
> + { VCPU_STAT("halt_exits", halt_exits) },
> + { VCPU_STAT("halt_wakeup", halt_wakeup) },
> + { VCPU_STAT("hypercalls", hypercalls) },
> + { VCPU_STAT("request_irq", request_irq_exits) },
> + { VCPU_STAT("irq_exits", irq_exits) },
> + { VCPU_STAT("host_state_reload", host_state_reload) },
> + { VCPU_STAT("efer_reload", efer_reload) },
> + { VCPU_STAT("fpu_reload", fpu_reload) },
> + { VCPU_STAT("insn_emulation", insn_emulation) },
> + { VCPU_STAT("insn_emulation_fail", insn_emulation_fail) },
> + { VCPU_STAT("irq_injections", irq_injections) },
> + { VCPU_STAT("nmi_injections", nmi_injections) },
> + { VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped) },
> + { VM_STAT("mmu_pte_write", mmu_pte_write) },
> + { VM_STAT("mmu_pte_updated", mmu_pte_updated) },
> + { VM_STAT("mmu_pde_zapped", mmu_pde_zapped) },
> + { VM_STAT("mmu_flooded", mmu_flooded) },
> + { VM_STAT("mmu_recycled", mmu_recycled) },
> + { VM_STAT("mmu_cache_miss", mmu_cache_miss) },
> + { VM_STAT("mmu_unsync", mmu_unsync) },
> + { VM_STAT("remote_tlb_flush", remote_tlb_flush) },
> + { VM_STAT("largepages", lpages) },

Please move the braces inside the macro as well.

> +   

Re: [PATCH v5 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-31 Thread Paolo Bonzini
Il 24/07/2014 06:57, Andy Lutomirski ha scritto:
> This adds a simple interface to allow a guest to request 64 bits of
> host nonblocking entropy.  This is independent of virtio-rng for a
> couple of reasons:
> 
>  - It's intended to be usable during early boot, when a trivial
>synchronous interface is needed.
> 
>  - virtio-rng gives blocking entropy, and making guest boot wait for
>the host's /dev/random will cause problems.
> 
> MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
> cryptographically secure data for use as a seed.  It provides no
> guarantee that the result contains any actual entropy.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  Documentation/virtual/kvm/cpuid.txt  | 3 +++
>  arch/x86/include/uapi/asm/kvm_para.h | 2 ++
>  arch/x86/kvm/cpuid.c | 3 ++-
>  arch/x86/kvm/x86.c   | 4 
>  4 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virtual/kvm/cpuid.txt 
> b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..0ab043b 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
> this feature bit
> ||   || before enabling 
> paravirtualized
> ||   || spinlock support.
>  
> --
> +KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data 
> via
> +   ||   || MSR_KVM_GET_RNG_SEED.
> +--
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
> guest-side
> ||   || per-cpu warps are expected in
> ||   || kvmclock.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 94dc8ca..e2eaf93 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -24,6 +24,7 @@
>  #define KVM_FEATURE_STEAL_TIME   5
>  #define KVM_FEATURE_PV_EOI   6
>  #define KVM_FEATURE_PV_UNHALT7
> +#define KVM_FEATURE_GET_RNG_SEED 8
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -40,6 +41,7 @@
>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN  0x4b564d04
> +#define MSR_KVM_GET_RNG_SEED 0x4b564d05
>  
>  struct kvm_steal_time {
>   __u64 steal;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..40d6763 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>(1 << KVM_FEATURE_ASYNC_PF) |
>(1 << KVM_FEATURE_PV_EOI) |
>(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> -  (1 << KVM_FEATURE_PV_UNHALT);
> +  (1 << KVM_FEATURE_PV_UNHALT) |
> +  (1 << KVM_FEATURE_GET_RNG_SEED);
>  
>   if (sched_info_on())
>   entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f644933..4e81853 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -48,6 +48,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #define CREATE_TRACE_POINTS
> @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
> u64 *pdata)
>   case MSR_KVM_PV_EOI_EN:
>   data = vcpu->arch.pv_eoi.msr_val;
>   break;
> + case MSR_KVM_GET_RNG_SEED:
> + get_random_bytes(&data, sizeof(data));
> + break;
>   case MSR_IA32_P5_MC_ADDR:
>   case MSR_IA32_P5_MC_TYPE:
>   case MSR_IA32_MCG_CAP:
> 

Acked-by: Paolo Bonzini 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-07-31 Thread Paolo Bonzini
Il 24/07/2014 06:57, Andy Lutomirski ha scritto:
> This is a straightforward implementation: for each bit of internal
> RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
> if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
> cryptographically secure output even if the CPU's RNG is weak or
> compromised.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/Kconfig |  4 
>  arch/x86/include/asm/kvm_guest.h |  9 +
>  arch/x86/kernel/archrandom.c | 25 -
>  arch/x86/kernel/kvm.c| 10 ++
>  4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a8f749e..adfa09c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -593,6 +593,7 @@ config KVM_GUEST
>   bool "KVM Guest support (including kvmclock)"
>   depends on PARAVIRT
>   select PARAVIRT_CLOCK
> + select ARCH_RANDOM
>   default y
>   ---help---
> This option enables various optimizations for running under the KVM
> @@ -1507,6 +1508,9 @@ config ARCH_RANDOM
> If supported, this is a high bandwidth, cryptographically
> secure hardware random number generator.
>  
> +   This also enables paravirt RNGs such as KVM's if the relevant
> +   PV guest support is enabled.
> +
>  config X86_SMAP
>   def_bool y
>   prompt "Supervisor Mode Access Prevention" if EXPERT
> diff --git a/arch/x86/include/asm/kvm_guest.h 
> b/arch/x86/include/asm/kvm_guest.h
> index a92b176..8c4dbd5 100644
> --- a/arch/x86/include/asm/kvm_guest.h
> +++ b/arch/x86/include/asm/kvm_guest.h
> @@ -3,4 +3,13 @@
>  
>  int kvm_setup_vsyscall_timeinfo(void);
>  
> +#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_ARCH_RANDOM)
> +extern bool kvm_get_rng_seed(u64 *rv);
> +#else
> +static inline bool kvm_get_rng_seed(u64 *rv)
> +{
> + return false;
> +}
> +#endif
> +
>  #endif /* _ASM_X86_KVM_GUEST_H */
> diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
> index 47d13b0..8c8d021 100644
> --- a/arch/x86/kernel/archrandom.c
> +++ b/arch/x86/kernel/archrandom.c
> @@ -15,6 +15,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  void arch_get_rng_seed(void *ctx,
>  void (*seed)(void *ctx, u32 data),
> @@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx,
>  const char *log_prefix)
>  {
>   int i;
> - int rdseed_bits = 0, rdrand_bits = 0;
> + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
>   char buf[128] = "";
>   char *msgptr = buf;
>  
> @@ -42,10 +43,32 @@ void arch_get_rng_seed(void *ctx,
>  #endif
>   }
>  
> + /*
> +  * Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
> +  * worked, since it incorporates entropy unavailable to the CPU,
> +  * and we shouldn't trust the hardware RNG more than we need to.
> +  * We request enough bits for the entire internal RNG state,
> +  * because there's no good reason not to.
> +  */
> + for (i = 0; i < bits_per_source; i += 64) {
> + u64 rv;
> +
> + if (kvm_get_rng_seed(&rv)) {
> + seed(ctx, (u32)rv);
> + seed(ctx, (u32)(rv >> 32));
> + kvm_bits += 8 * sizeof(rv);
> + } else {
> + break;  /* If it fails once, it will keep failing. */
> + }
> + }
> +
>   if (rdseed_bits)
>   msgptr += sprintf(msgptr, ", %d bits from RDSEED", rdseed_bits);
>   if (rdrand_bits)
>   msgptr += sprintf(msgptr, ", %d bits from RDRAND", rdrand_bits);
> + if (kvm_bits)
> + msgptr += sprintf(msgptr, ", %d bits from KVM_GET_RNG_BITS",
> +   kvm_bits);
>   if (buf[0])
>   pr_info("%s with %s\n", log_prefix, buf + 2);
>  }
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 3dd8e2c..bd8783a 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
>   wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
>  }
>  
> +bool kvm_get_rng_seed(u64 *v)
> +{
> + /*
> +  * Allow migration from a hypervisor with the GET_RNG_SEED
> +  * feature to a hypervisor without it.
> +  */
> + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) &&
> + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
> +}
> +
>  #ifdef CONFIG_SMP
>  static void __init kvm_smp_prepare_boot_cpu(void)
>  {
> 

Acked-by: Paolo Bonzini 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-31 Thread Paolo Bonzini
Il 24/07/2014 06:57, Andy Lutomirski ha scritto:
> It's considerably better than any of the alternatives on KVM.
> 
> Rather than reinventing all of the cpu feature query code, this fixes
> native_cpuid to work in PIC objects.
> 
> I haven't combined it with boot/cpuflags.c's cpuid implementation:
> including asm/processor.h from boot/cpuflags.c results in a flood of
> unrelated errors, and fixing it might be messy.
> 
> Reviewed-by: Kees Cook 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/boot/compressed/aslr.c  | 27 +++
>  arch/x86/include/asm/processor.h | 21 ++---
>  2 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index fc6091a..8583f0e 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -5,6 +5,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -15,6 +17,22 @@
>  static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>  
> +static bool kvm_para_has_feature(unsigned int feature)
> +{
> + u32 kvm_base;
> + u32 features;
> +
> + if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
> + return false;
> +
> + kvm_base = hypervisor_cpuid_base("KVMKVMKVM\0\0\0", KVM_CPUID_FEATURES);
> + if (!kvm_base)
> + return false;
> +
> + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
> + return features & (1UL << feature);
> +}
> +
>  #define I8254_PORT_CONTROL   0x43
>  #define I8254_PORT_COUNTER0  0x40
>  #define I8254_CMD_READBACK   0xC0
> @@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
>   }
>   }
>  
> + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
> + u64 seed;
> +
> + debug_putstr(" MSR_KVM_GET_RNG_SEED");
> + rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
> + random ^= (unsigned long)seed;
> + use_i8254 = false;
> + }
> +
>   if (has_cpuflag(X86_FEATURE_TSC)) {
>   debug_putstr(" RDTSC");
>   rdtscll(raw);
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index a4ea023..6096f3c 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
>  static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
>   unsigned int *ecx, unsigned int *edx)
>  {
> - /* ecx is often an input as well as an output. */
> - asm volatile("cpuid"
> + /*
> +  * This function can be used from the boot code, so it needs
> +  * to avoid using EBX in constraints in PIC mode.
> +  *
> +  * ecx is often an input as well as an output.
> +  */
> + asm volatile(".ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t"
> +  "movl  %%ebx,%1\n\t"
> +  ".endif ; .endif   \n\t"
> +  "cpuid \n\t"
> +  ".ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t"
> +  "xchgl %%ebx,%1\n\t"
> +  ".endif ; .endif"
>   : "=a" (*eax),
> -   "=b" (*ebx),
> +#if defined(__i386__) && defined(__PIC__)
> +   "=r" (*ebx),  /* gcc won't let us use ebx */
> +#else
> +   "=b" (*ebx),  /* ebx is okay */
> +#endif
> "=c" (*ecx),
> "=d" (*edx)
>   : "0" (*eax), "2" (*ecx)
> 

Acked-by: Paolo Bonzini 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] KVM: device: add simple registration mechanism for kvm_device_ops

2014-07-31 Thread Christoffer Dall
On Tue, Jul 01, 2014 at 03:45:15PM +0100, Will Deacon wrote:
> kvm_ioctl_create_device currently has knowledge of all the device types
> and their associated ops. This is fairly inflexible when adding support
> for new in-kernel device emulations, so move what we currently have out
> into a table, which can support dynamic registration of ops by new
> drivers for virtual hardware.
> 
> I didn't try to port all current drivers over, as it's not always clear
> which initialisation hook the ops should be registered from.
> 
> Cc: Cornelia Huck 
> Cc: Alex Williamson 
> Cc: Alex Graf 
> Cc: Gleb Natapov 
> Cc: Paolo Bonzini 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Signed-off-by: Will Deacon 
> ---
> 
Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] KVM: ARM: vgic: register kvm_device_ops dynamically

2014-07-31 Thread Christoffer Dall
On Tue, Jul 01, 2014 at 03:45:16PM +0100, Will Deacon wrote:
> Now that we have a dynamic means to register kvm_device_ops, use that
> for the ARM VGIC, instead of relying on the static table.
> 
> Cc: Gleb Natapov 
> Cc: Paolo Bonzini 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Signed-off-by: Will Deacon 
> ---
>  include/linux/kvm_host.h |   1 -
>  virt/kvm/arm/vgic.c  | 156 
> +++
>  virt/kvm/kvm_main.c  |   4 --
>  3 files changed, 78 insertions(+), 83 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b75faaf0d76d..4317fdd10696 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1088,7 +1088,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, 
> u32 type);
>  extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
>  extern struct kvm_device_ops kvm_vfio_ops;
> -extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_flic_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 795ab482333d..d9d0bcebad2b 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1502,83 +1502,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>  
> -static void vgic_init_maintenance_interrupt(void *info)
> -{
> - enable_percpu_irq(vgic->maint_irq, 0);
> -}
> -
> -static int vgic_cpu_notify(struct notifier_block *self,
> -unsigned long action, void *cpu)
> -{
> - switch (action) {
> - case CPU_STARTING:
> - case CPU_STARTING_FROZEN:
> - vgic_init_maintenance_interrupt(NULL);
> - break;
> - case CPU_DYING:
> - case CPU_DYING_FROZEN:
> - disable_percpu_irq(vgic->maint_irq);
> - break;
> - }
> -
> - return NOTIFY_OK;
> -}
> -
> -static struct notifier_block vgic_cpu_nb = {
> - .notifier_call = vgic_cpu_notify,
> -};
> -
> -static const struct of_device_id vgic_ids[] = {
> - { .compatible = "arm,cortex-a15-gic", .data = vgic_v2_probe, },
> - { .compatible = "arm,gic-v3", .data = vgic_v3_probe, },
> - {},
> -};
> -
> -int kvm_vgic_hyp_init(void)
> -{
> - const struct of_device_id *matched_id;
> - int (*vgic_probe)(struct device_node *,const struct vgic_ops **,
> -   const struct vgic_params **);
> - struct device_node *vgic_node;
> - int ret;
> -
> - vgic_node = of_find_matching_node_and_match(NULL,
> - vgic_ids, &matched_id);
> - if (!vgic_node) {
> - kvm_err("error: no compatible GIC node found\n");
> - return -ENODEV;
> - }
> -
> - vgic_probe = matched_id->data;
> - ret = vgic_probe(vgic_node, &vgic_ops, &vgic);
> - if (ret)
> - return ret;
> -
> - ret = request_percpu_irq(vgic->maint_irq, vgic_maintenance_handler,
> -  "vgic", kvm_get_running_vcpus());
> - if (ret) {
> - kvm_err("Cannot register interrupt %d\n", vgic->maint_irq);
> - return ret;
> - }
> -
> - ret = __register_cpu_notifier(&vgic_cpu_nb);
> - if (ret) {
> - kvm_err("Cannot register vgic CPU notifier\n");
> - goto out_free_irq;
> - }
> -
> - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
> -
> - /* Callback into for arch code for setup */
> - vgic_arch_setup(vgic);
> -
> - return 0;
> -
> -out_free_irq:
> - free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus());
> - return ret;
> -}
> -
>  /**
>   * kvm_vgic_init - Initialize global VGIC state before running any VCPUs
>   * @kvm: pointer to the kvm struct
> @@ -2042,7 +1965,7 @@ static int vgic_create(struct kvm_device *dev, u32 type)
>   return kvm_vgic_create(dev->kvm);
>  }
>  
> -struct kvm_device_ops kvm_arm_vgic_v2_ops = {
> +static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>   .name = "kvm-arm-vgic",
>   .create = vgic_create,
>   .destroy = vgic_destroy,
> @@ -2050,3 +1973,80 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>   .get_attr = vgic_get_attr,
>   .has_attr = vgic_has_attr,
>  };
> +
> +static void vgic_init_maintenance_interrupt(void *info)
> +{
> + enable_percpu_irq(vgic->maint_irq, 0);
> +}
> +
> +static int vgic_cpu_notify(struct notifier_block *self,
> +unsigned long action, void *cpu)
> +{
> + switch (action) {
> + case CPU_STARTING:
> + case CPU_STARTING_FROZEN:
> + vgic_init_maintenance_interrupt(NULL);
> + break;
> + case CPU_DYING:
> + case CPU_DYING_FROZEN:
> + disable_percpu_irq(vgic->maint_irq);
> + break;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block vgic_cpu_nb = {
> + .notifier_call = vgic_cpu_notify,
> +};
> +
> +static const struct of

[PATCH][next] arm64: KVM: GICv3: move system register access to msr_s/mrs_s

2014-07-31 Thread Marc Zyngier
Commit 72c583951526 (arm64: gicv3: Allow GICv3 compilation with
older binutils) changed the way we express the GICv3 system registers,
but couldn't change the occurences used by KVM as the code wasn't
merged yet.

Just fix the accessors.

Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
-next is currently borked. I suggest we take this patch via the kvmarm tree,
and only send our PR once the arm64 tree has hit Linus' tree. It contains
the same dependency on the GIC tree anyway.

 arch/arm64/kvm/vgic-v3-switch.S | 130 
 1 file changed, 65 insertions(+), 65 deletions(-)

diff --git a/arch/arm64/kvm/vgic-v3-switch.S b/arch/arm64/kvm/vgic-v3-switch.S
index 21e68f6..d160469 100644
--- a/arch/arm64/kvm/vgic-v3-switch.S
+++ b/arch/arm64/kvm/vgic-v3-switch.S
@@ -48,11 +48,11 @@
dsb st
 
// Save all interesting registers
-   mrs x4, ICH_HCR_EL2
-   mrs x5, ICH_VMCR_EL2
-   mrs x6, ICH_MISR_EL2
-   mrs x7, ICH_EISR_EL2
-   mrs x8, ICH_ELSR_EL2
+   mrs_s   x4, ICH_HCR_EL2
+   mrs_s   x5, ICH_VMCR_EL2
+   mrs_s   x6, ICH_MISR_EL2
+   mrs_s   x7, ICH_EISR_EL2
+   mrs_s   x8, ICH_ELSR_EL2
 
str w4, [x3, #VGIC_V3_CPU_HCR]
str w5, [x3, #VGIC_V3_CPU_VMCR]
@@ -60,9 +60,9 @@
str w7, [x3, #VGIC_V3_CPU_EISR]
str w8, [x3, #VGIC_V3_CPU_ELRSR]
 
-   msr ICH_HCR_EL2, xzr
+   msr_s   ICH_HCR_EL2, xzr
 
-   mrs x21, ICH_VTR_EL2
+   mrs_s   x21, ICH_VTR_EL2
mvn w22, w21
ubfiz   w23, w22, 2, 4  // w23 = (15 - ListRegs) * 4
 
@@ -71,22 +71,22 @@
br  x24
 
 1:
-   mrs x20, ICH_LR15_EL2
-   mrs x19, ICH_LR14_EL2
-   mrs x18, ICH_LR13_EL2
-   mrs x17, ICH_LR12_EL2
-   mrs x16, ICH_LR11_EL2
-   mrs x15, ICH_LR10_EL2
-   mrs x14, ICH_LR9_EL2
-   mrs x13, ICH_LR8_EL2
-   mrs x12, ICH_LR7_EL2
-   mrs x11, ICH_LR6_EL2
-   mrs x10, ICH_LR5_EL2
-   mrs x9, ICH_LR4_EL2
-   mrs x8, ICH_LR3_EL2
-   mrs x7, ICH_LR2_EL2
-   mrs x6, ICH_LR1_EL2
-   mrs x5, ICH_LR0_EL2
+   mrs_s   x20, ICH_LR15_EL2
+   mrs_s   x19, ICH_LR14_EL2
+   mrs_s   x18, ICH_LR13_EL2
+   mrs_s   x17, ICH_LR12_EL2
+   mrs_s   x16, ICH_LR11_EL2
+   mrs_s   x15, ICH_LR10_EL2
+   mrs_s   x14, ICH_LR9_EL2
+   mrs_s   x13, ICH_LR8_EL2
+   mrs_s   x12, ICH_LR7_EL2
+   mrs_s   x11, ICH_LR6_EL2
+   mrs_s   x10, ICH_LR5_EL2
+   mrs_s   x9, ICH_LR4_EL2
+   mrs_s   x8, ICH_LR3_EL2
+   mrs_s   x7, ICH_LR2_EL2
+   mrs_s   x6, ICH_LR1_EL2
+   mrs_s   x5, ICH_LR0_EL2
 
adr x24, 1f
add x24, x24, x23
@@ -113,34 +113,34 @@
tbnzw21, #29, 6f// 6 bits
tbz w21, #30, 5f// 5 bits
// 7 bits
-   mrs x20, ICH_AP0R3_EL2
+   mrs_s   x20, ICH_AP0R3_EL2
str w20, [x3, #(VGIC_V3_CPU_AP0R + 3*4)]
-   mrs x19, ICH_AP0R2_EL2
+   mrs_s   x19, ICH_AP0R2_EL2
str w19, [x3, #(VGIC_V3_CPU_AP0R + 2*4)]
-6: mrs x18, ICH_AP0R1_EL2
+6: mrs_s   x18, ICH_AP0R1_EL2
str w18, [x3, #(VGIC_V3_CPU_AP0R + 1*4)]
-5: mrs x17, ICH_AP0R0_EL2
+5: mrs_s   x17, ICH_AP0R0_EL2
str w17, [x3, #VGIC_V3_CPU_AP0R]
 
tbnzw21, #29, 6f// 6 bits
tbz w21, #30, 5f// 5 bits
// 7 bits
-   mrs x20, ICH_AP1R3_EL2
+   mrs_s   x20, ICH_AP1R3_EL2
str w20, [x3, #(VGIC_V3_CPU_AP1R + 3*4)]
-   mrs x19, ICH_AP1R2_EL2
+   mrs_s   x19, ICH_AP1R2_EL2
str w19, [x3, #(VGIC_V3_CPU_AP1R + 2*4)]
-6: mrs x18, ICH_AP1R1_EL2
+6: mrs_s   x18, ICH_AP1R1_EL2
str w18, [x3, #(VGIC_V3_CPU_AP1R + 1*4)]
-5: mrs x17, ICH_AP1R0_EL2
+5: mrs_s   x17, ICH_AP1R0_EL2
str w17, [x3, #VGIC_V3_CPU_AP1R]
 
// Restore SRE_EL1 access and re-enable SRE at EL1.
-   mrs x5, ICC_SRE_EL2
+   mrs_s   x5, ICC_SRE_EL2
orr x5, x5, #ICC_SRE_EL2_ENABLE
-   msr ICC_SRE_EL2, x5
+   msr_s   ICC_SRE_EL2, x5
isb
mov x5, #1
-   msr ICC_SRE_EL1, x5
+   msr_s   ICC_SRE_EL1, x5
 .endm
 
 /*
@@ -150,7 +150,7 @@
 .macro restore_vgic_v3_state
// Disable SRE_EL1 access. Necessary, otherwise
// ICH_VMCR_EL2.VFIQEn becomes one, and FIQ happens...
-   msr ICC_SRE_EL1, xzr
+   msr_s   ICC_SRE_EL1, xzr
isb
 
// Compute the address of struct vgic_cpu
@@ -160,34 +160,34 @@
ldr w4, [x3, #VGIC_V3_CPU_HCR]
ldr w5, [x3, #VGIC_V3_CPU_VMCR]
 
-   msr ICH_HCR_EL2, x4
-   msr ICH_VMCR_EL2, x5
+   msr_s   ICH_HCR_EL2, x4
+   msr_s   ICH_VMCR_EL2, x5
 
-   mrs x21, ICH_VTR_EL

Re: [PATCH][next] arm64: KVM: GICv3: move system register access to msr_s/mrs_s

2014-07-31 Thread Will Deacon
On Thu, Jul 31, 2014 at 02:16:39PM +0100, Marc Zyngier wrote:
> Commit 72c583951526 (arm64: gicv3: Allow GICv3 compilation with
> older binutils) changed the way we express the GICv3 system registers,
> but couldn't change the occurences used by KVM as the code wasn't
> merged yet.
> 
> Just fix the accessors.
> 
> Cc: Will Deacon 
> Cc: Catalin Marinas 
> Cc: Christoffer Dall 
> Signed-off-by: Marc Zyngier 
> ---
> -next is currently borked. I suggest we take this patch via the kvmarm tree,
> and only send our PR once the arm64 tree has hit Linus' tree. It contains
> the same dependency on the GIC tree anyway.

I'm fine with that as long as Paulo doesn't mind waiting for the arm64 stuff
to go in first (I have no reason to delay my pull request, so shouldn't be
an issue).

If it helps:

  Acked-by: Will Deacon 

I can't help but point out that we wouldn't be in this mess if you'd got
your stuff into -next sooner ;)

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] KVM: ARM: vgic: register kvm_device_ops dynamically

2014-07-31 Thread Will Deacon
Hi Christoffer,

On Thu, Jul 31, 2014 at 01:10:15PM +0100, Christoffer Dall wrote:
> On Tue, Jul 01, 2014 at 03:45:16PM +0100, Will Deacon wrote:
> > Now that we have a dynamic means to register kvm_device_ops, use that
> > for the ARM VGIC, instead of relying on the static table.
> > 
> > Cc: Gleb Natapov 
> > Cc: Paolo Bonzini 
> > Cc: Marc Zyngier 
> > Cc: Christoffer Dall 
> > Signed-off-by: Will Deacon 
> > ---
> >  include/linux/kvm_host.h |   1 -
> >  virt/kvm/arm/vgic.c  | 156 
> > +++
> >  virt/kvm/kvm_main.c  |   4 --
> >  3 files changed, 78 insertions(+), 83 deletions(-)

[...]

> > +   if (ret) {
> > +   kvm_err("Cannot register vgic CPU notifier\n");
> > +   goto out_free_irq;
> > +   }
> > +
> > +   on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
> > +
> > +   /* Callback into for arch code for setup */
> > +   vgic_arch_setup(vgic);
> 
> close to ridiculous nit: but I would add a newline here since the
> comment doesn't really apply to the kvm_register_device_ops, but please
> don't make another revision just for this.

I'll repost this after the merge window anyway, so there's plenty of room
for pedantic, cosmetic changes!

Thanks,

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][next] arm64: KVM: GICv3: move system register access to msr_s/mrs_s

2014-07-31 Thread Christoffer Dall
On Thu, Jul 31, 2014 at 02:19:47PM +0100, Will Deacon wrote:
> On Thu, Jul 31, 2014 at 02:16:39PM +0100, Marc Zyngier wrote:
> > Commit 72c583951526 (arm64: gicv3: Allow GICv3 compilation with
> > older binutils) changed the way we express the GICv3 system registers,
> > but couldn't change the occurences used by KVM as the code wasn't
> > merged yet.
> > 
> > Just fix the accessors.
> > 
> > Cc: Will Deacon 
> > Cc: Catalin Marinas 
> > Cc: Christoffer Dall 
> > Signed-off-by: Marc Zyngier 
> > ---
> > -next is currently borked. I suggest we take this patch via the kvmarm tree,
> > and only send our PR once the arm64 tree has hit Linus' tree. It contains
> > the same dependency on the GIC tree anyway.
> 
> I'm fine with that as long as Paulo doesn't mind waiting for the arm64 stuff
> to go in first (I have no reason to delay my pull request, so shouldn't be
> an issue).
> 
> If it helps:
> 
>   Acked-by: Will Deacon 
> 
> I can't help but point out that we wouldn't be in this mess if you'd got
> your stuff into -next sooner ;)
> 

Well yes, but it was hard to plan our holidays around the deadlines and
unfortunately I couldn't find time to verify the kvmarm branch to make
it ready for -next inclusion before I went on holiday this time.  In any
case, this should be the exception rather than the rule, and I do try to
get the next branch ready as soon as possible usually.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: perf/stat: Properly show submicrosecond times

2014-07-31 Thread David Ahern

On 7/31/14, 5:13 AM, Christian Borntraeger wrote:

For lots of exits the min time (and sometimes max) is 0 or 1. Lets
increase the accurancy similar to what the average field alread does.


Seems reasonable to me.

Acked-by: David Ahern 




Cc: Paolo Bonzini 
Cc: Jiri Olsa 
Cc: David Ahern 
Cc: Arnaldo Carvalho de Melo 

Signed-off-by: Christian Borntraeger 
---
  tools/perf/builtin-kvm.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 43367eb..fe92dfd 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -592,8 +592,8 @@ static void print_result(struct perf_kvm_stat *kvm)
pr_info("%9s ", "Samples%");

pr_info("%9s ", "Time%");
-   pr_info("%10s ", "Min Time");
-   pr_info("%10s ", "Max Time");
+   pr_info("%11s ", "Min Time");
+   pr_info("%11s ", "Max Time");
pr_info("%16s ", "Avg time");
pr_info("\n\n");

@@ -610,8 +610,8 @@ static void print_result(struct perf_kvm_stat *kvm)
pr_info("%10llu ", (unsigned long long)ecount);
pr_info("%8.2f%% ", (double)ecount / kvm->total_count * 100);
pr_info("%8.2f%% ", (double)etime / kvm->total_time * 100);
-   pr_info("%8" PRIu64 "us ", min / 1000);
-   pr_info("%8" PRIu64 "us ", max / 1000);
+   pr_info("%9.2fus ", (double)min / 1e3);
+   pr_info("%9.2fus ", (double)max / 1e3);
pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
kvm_event_rel_stddev(vcpu, event));
pr_info("\n");



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 0/2] KVM: s390: Fixes for kvm/next (3.17)

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 09:59, Christian Borntraeger ha scritto:
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
> tags/kvm-s390-20140730

Thanks, applying.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Christoffer Dall
On Wed, Jul 09, 2014 at 02:55:12PM +0100, Alex Bennée wrote:
> To cleanly restore an SMP VM we need to ensure that the current pause
> state of each vcpu is correctly recorded. Things could get confused if
> the CPU starts running after migration restore completes when it was
> paused before it state was captured.
> 
> I've done this by exposing a register (currently only 1 bit used) via
> the GET/SET_ONE_REG logic to pass the state between KVM and the VM
> controller (e.g. QEMU).
> 
> Signed-off-by: Alex Bennée 
> ---
>  arch/arm64/include/uapi/asm/kvm.h |  8 +
>  arch/arm64/kvm/guest.c| 61 
> ++-
>  2 files changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index eaf54a3..8990e6e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -148,6 +148,14 @@ struct kvm_arch_memory_slot {
>  #define KVM_REG_ARM_TIMER_CNTARM64_SYS_REG(3, 3, 14, 3, 2)
>  #define KVM_REG_ARM_TIMER_CVAL   ARM64_SYS_REG(3, 3, 14, 0, 2)
>  
> +/* Power state (PSCI), not real registers */
> +#define KVM_REG_ARM_PSCI (0x0014 << KVM_REG_ARM_COPROC_SHIFT)
> +#define KVM_REG_ARM_PSCI_REG(n) \
> + (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_PSCI | \
> + (n & ~KVM_REG_ARM_COPROC_MASK))

I don't understand this mask, why isn't this
(n & 0x))

> +#define KVM_REG_ARM_PSCI_STATE  KVM_REG_ARM_PSCI_REG(0)
> +#define NUM_KVM_PSCI_REGS   1
> +

you're missing updates to Documentation/virtual/kvm/api.txt here.

>  /* Device Control API: ARM VGIC */
>  #define KVM_DEV_ARM_VGIC_GRP_ADDR0
>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS   1
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 205f0d8..31d6439 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -189,6 +189,54 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *reg)
>  }
>  
>  /**
> + * PSCI State
> + *
> + * These are not real registers as they do not actually exist in the
> + * hardware but represent the current power state of the vCPU

full stop

> + */
> +
> +static bool is_psci_reg(u64 index)
> +{
> + switch (index) {
> + case KVM_REG_ARM_PSCI_STATE:
> + return true;
> + }
> + return false;
> +}
> +
> +static int copy_psci_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> +{
> + if (put_user(KVM_REG_ARM_PSCI_STATE, uindices))
> + return -EFAULT;
> + return 0;
> +}
> +
> +static int set_psci_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> + void __user *uaddr = (void __user *)(long)reg->addr;
> + u64 val;
> + int ret;
> +
> + ret = copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id));
> + if (ret != 0)
> + return ret;
> +
> +vcpu->arch.pause = (val & 0x1) ? false : true;

tabs

I really need the documentation of the ABI, why is bit[0] == 1 not paused?

If we are not complaining when setting the pause value to false if it
was true before, then we probably also need to wake up the thread in
case this is called from another thread, right?

or perhaps we should just return an error if you're trying to un-pause a
CPU through this interface, h.

> + return 0;
> +}
> +
> +static int get_psci_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> + void __user *uaddr = (void __user *)(long)reg->addr;
> + u64 val;
> +
> +/* currently we only use one bit */

tabs

useless comment, just include docs.

> + val = vcpu->arch.pause ? 0 : 1;
> + return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id));
> +}
> +
> +
> +/**
>   * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
>   *
>   * This is for all registers.
> @@ -196,7 +244,7 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *reg)
>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
>  {
>   return num_core_regs() + kvm_arm_num_sys_reg_descs(vcpu)
> -+ NUM_TIMER_REGS;
> ++ NUM_TIMER_REGS + NUM_KVM_PSCI_REGS;
>  }
>  
>  /**
> @@ -221,6 +269,11 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 
> __user *uindices)
>   return ret;
>   uindices += NUM_TIMER_REGS;
>  
> +ret = copy_psci_indices(vcpu, uindices);

indentation, please use tabs.

> + if (ret)
> + return ret;
> + uindices += NUM_KVM_PSCI_REGS;
> +
>   return kvm_arm_copy_sys_reg_indices(vcpu, uindices);
>  }
>  
> @@ -237,6 +290,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct 
> kvm_one_reg *reg)
>   if (is_timer_reg(reg->id))
>   return get_timer_reg(vcpu, reg);
>  
> +if (is_psci_reg(reg->id))
> + return get_psci_reg(vcpu, reg);
> +
>   return kvm_arm_sys_reg_get_reg(vcpu, reg);
>  }
>  
> @@ -253,6 +309,9 @

Re: [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 11:47, Igor Mammedov ha scritto:
> Signed-off-by: Igor Mammedov 
> ---
>  arch/x86/kvm/cpuid.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..f519823 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -478,8 +478,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>(1 << KVM_FEATURE_CLOCKSOURCE2) |
>(1 << KVM_FEATURE_ASYNC_PF) |
>(1 << KVM_FEATURE_PV_EOI) |
> -  (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
>(1 << KVM_FEATURE_PV_UNHALT);
> + if (!static_cpu_has_bug(X86_BUG_AMD_TSC_DRIFT))
> + entry->eax |= (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>  
>   if (sched_info_on())
>   entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> 

Marcelo, is there anything we can do if the VM migrates from a sane host
to a buggy one?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: perf/stat: Properly show submicrosecond times

2014-07-31 Thread Arnaldo Carvalho de Melo
Em Thu, Jul 31, 2014 at 08:24:03AM -0600, David Ahern escreveu:
> On 7/31/14, 5:13 AM, Christian Borntraeger wrote:
> >For lots of exits the min time (and sometimes max) is 0 or 1. Lets
> >increase the accurancy similar to what the average field alread does.
> 
> Seems reasonable to me.
> 
> Acked-by: David Ahern 

Thanks, applied.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Alex Bennée

Christoffer Dall writes:

> On Wed, Jul 09, 2014 at 02:55:12PM +0100, Alex Bennée wrote:
>> To cleanly restore an SMP VM we need to ensure that the current pause
>> state of each vcpu is correctly recorded. Things could get confused if
>> the CPU starts running after migration restore completes when it was
>> paused before it state was captured.
>> 

>> +/* Power state (PSCI), not real registers */
>> +#define KVM_REG_ARM_PSCI(0x0014 << KVM_REG_ARM_COPROC_SHIFT)
>> +#define KVM_REG_ARM_PSCI_REG(n) \
>> +(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_PSCI | \
>> + (n & ~KVM_REG_ARM_COPROC_MASK))
>
> I don't understand this mask, why isn't this
> (n & 0x))

I was trying to use the existing masks, but of course if anyone changes
that it would be an ABI change so probably not worth it.

>
>> +#define KVM_REG_ARM_PSCI_STATE  KVM_REG_ARM_PSCI_REG(0)
>> +#define NUM_KVM_PSCI_REGS   1
>> +
>
> you're missing updates to Documentation/virtual/kvm/api.txt here.

Will add.

>>  /* Device Control API: ARM VGIC */
>>  #define KVM_DEV_ARM_VGIC_GRP_ADDR   0
>>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS  1
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 205f0d8..31d6439 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -189,6 +189,54 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const 
>> struct kvm_one_reg *reg)
>>  }
>>  
>>  /**
>> + * PSCI State
>> + *
>> + * These are not real registers as they do not actually exist in the
>> + * hardware but represent the current power state of the vCPU
>
> full stop
>
>> + */
>> +
>> +static bool is_psci_reg(u64 index)
>> +{
>> +switch (index) {
>> +case KVM_REG_ARM_PSCI_STATE:
>> +return true;
>> +}
>> +return false;
>> +}
>> +
>> +static int copy_psci_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
>> +{
>> +if (put_user(KVM_REG_ARM_PSCI_STATE, uindices))
>> +return -EFAULT;
>> +return 0;
>> +}
>> +
>> +static int set_psci_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg 
>> *reg)
>> +{
>> +void __user *uaddr = (void __user *)(long)reg->addr;
>> +u64 val;
>> +int ret;
>> +
>> +ret = copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id));
>> +if (ret != 0)
>> +return ret;
>> +
>> +vcpu->arch.pause = (val & 0x1) ? false : true;
>
> tabs

Hmmm, apparently the GNU Emacs "linux" style doesn't actually enforce
that. Who knew? I'll beat the editor into submission.

> I really need the documentation of the ABI, why is bit[0] == 1 not
> paused?

I figured 1 == running, but I can switch it round if you want to to map
directly to the .pause flag.

> If we are not complaining when setting the pause value to false if it
> was true before, then we probably also need to wake up the thread in
> case this is called from another thread, right?
>
> or perhaps we should just return an error if you're trying to un-pause a
> CPU through this interface, h.

Wouldn't it be an error to mess with any register when the system is not
in a quiescent state? I was assuming that the wake state is dealt with
when the run loop finally restarts. 


>
> please check for use of tabs versus spaces, checkpatch.pl should
> complain.
>
> Can you add the 32-bit counterpart as part of this patch?

Same patch? Sure.
>
> Thanks,
> -Christoffer

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh

2014-07-31 Thread Borislav Petkov
On Thu, Jul 31, 2014 at 09:47:12AM +, Igor Mammedov wrote:
> Due to erratum #778 from
> "Revision Guide for AMD Family 15h Models 10h-1Fh Processors,
>  Publication # 48931, Issue Date: May 2013, Revision: 3.10"
> 
> TSC on affected processor, a core may drift under certain conditions,
> which makes initially synchronized TSCs to become unsynchronized.

Is this something you're seeing on a real system? If so, how do you
trigger this?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][next] arm64: KVM: GICv3: move system register access to msr_s/mrs_s

2014-07-31 Thread Will Deacon
On Thu, Jul 31, 2014 at 02:32:27PM +0100, Christoffer Dall wrote:
> On Thu, Jul 31, 2014 at 02:19:47PM +0100, Will Deacon wrote:
> > On Thu, Jul 31, 2014 at 02:16:39PM +0100, Marc Zyngier wrote:
> > > Commit 72c583951526 (arm64: gicv3: Allow GICv3 compilation with
> > > older binutils) changed the way we express the GICv3 system registers,
> > > but couldn't change the occurences used by KVM as the code wasn't
> > > merged yet.
> > > 
> > > Just fix the accessors.
> > > 
> > > Cc: Will Deacon 
> > > Cc: Catalin Marinas 
> > > Cc: Christoffer Dall 
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > > -next is currently borked. I suggest we take this patch via the kvmarm 
> > > tree,
> > > and only send our PR once the arm64 tree has hit Linus' tree. It contains
> > > the same dependency on the GIC tree anyway.
> > 
> > I'm fine with that as long as Paulo doesn't mind waiting for the arm64 stuff
> > to go in first (I have no reason to delay my pull request, so shouldn't be
> > an issue).
> > 
> > If it helps:
> > 
> >   Acked-by: Will Deacon 
> > 
> > I can't help but point out that we wouldn't be in this mess if you'd got
> > your stuff into -next sooner ;)
> > 
> 
> Well yes, but it was hard to plan our holidays around the deadlines and
> unfortunately I couldn't find time to verify the kvmarm branch to make
> it ready for -next inclusion before I went on holiday this time.  In any
> case, this should be the exception rather than the rule, and I do try to
> get the next branch ready as soon as possible usually.

Understood, but the communication could have been a lot better.

Whilst you were away, -next broke due to the non-virt parts of the GICv3
driver (assumedly going via the irqchip tree?):

  http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/275410.html

Catalin fixed that up, and then Jason stated that the intention was for
GICv3 to go via the arm64 tree (see above link). Based on that, we pulled
his tag into our tree and applied our fix on top.

Now a bunch of stuff has cropped up out of nowhere causing conflicts and
breakages for everybody ~3 days before the merge window opens (the code
defaults to 'y'). We're not mind-readers, so all it would have taken is
a mail to the relevant maintainers before everybody disappeared on holiday
saying (a) what the merge plan is and (b) what to do if everything goes
wrong while you're away.

Bah, maybe I'm just being grumpy...

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][next] arm64: KVM: GICv3: move system register access to msr_s/mrs_s

2014-07-31 Thread Christoffer Dall
On Thu, Jul 31, 2014 at 05:05:58PM +0100, Will Deacon wrote:
> On Thu, Jul 31, 2014 at 02:32:27PM +0100, Christoffer Dall wrote:
> > On Thu, Jul 31, 2014 at 02:19:47PM +0100, Will Deacon wrote:
> > > On Thu, Jul 31, 2014 at 02:16:39PM +0100, Marc Zyngier wrote:
> > > > Commit 72c583951526 (arm64: gicv3: Allow GICv3 compilation with
> > > > older binutils) changed the way we express the GICv3 system registers,
> > > > but couldn't change the occurences used by KVM as the code wasn't
> > > > merged yet.
> > > > 
> > > > Just fix the accessors.
> > > > 
> > > > Cc: Will Deacon 
> > > > Cc: Catalin Marinas 
> > > > Cc: Christoffer Dall 
> > > > Signed-off-by: Marc Zyngier 
> > > > ---
> > > > -next is currently borked. I suggest we take this patch via the kvmarm 
> > > > tree,
> > > > and only send our PR once the arm64 tree has hit Linus' tree. It 
> > > > contains
> > > > the same dependency on the GIC tree anyway.
> > > 
> > > I'm fine with that as long as Paulo doesn't mind waiting for the arm64 
> > > stuff
> > > to go in first (I have no reason to delay my pull request, so shouldn't be
> > > an issue).
> > > 
> > > If it helps:
> > > 
> > >   Acked-by: Will Deacon 
> > > 
> > > I can't help but point out that we wouldn't be in this mess if you'd got
> > > your stuff into -next sooner ;)
> > > 
> > 
> > Well yes, but it was hard to plan our holidays around the deadlines and
> > unfortunately I couldn't find time to verify the kvmarm branch to make
> > it ready for -next inclusion before I went on holiday this time.  In any
> > case, this should be the exception rather than the rule, and I do try to
> > get the next branch ready as soon as possible usually.
> 
> Understood, but the communication could have been a lot better.
> 
> Whilst you were away, -next broke due to the non-virt parts of the GICv3
> driver (assumedly going via the irqchip tree?):
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/275410.html
> 
> Catalin fixed that up, and then Jason stated that the intention was for
> GICv3 to go via the arm64 tree (see above link). Based on that, we pulled
> his tag into our tree and applied our fix on top.
> 
> Now a bunch of stuff has cropped up out of nowhere causing conflicts and
> breakages for everybody ~3 days before the merge window opens (the code
> defaults to 'y'). We're not mind-readers, so all it would have taken is
> a mail to the relevant maintainers before everybody disappeared on holiday
> saying (a) what the merge plan is and (b) what to do if everything goes
> wrong while you're away.
> 
> Bah, maybe I'm just being grumpy...
> 
Besides from postponing my vacation I'm not really sure what I could
have done differently.  In this thread (on which you were cc'ed) I
really tried to make people aware of what was coming:

https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010482.html

nobody replied to my last mail about something potentially messing stuff
up in the queue branch.

Maybe my fault was my reluctance to put patches that hadn't been able to
verify as working on a v8 platform into a branch that was pulled into
-next.  If I was being overly cautious in that sense, that's a fair
point and I can adjust future behavior for that.

Otherwise I think this was just an unfortunate series of events with a
lot of new stuff coming in combined with finding some nasty bugs.  Hey,
it happens.

Also, it's not like anyone sent me an e-mail saying URGENT, fix your
sh*t, in which case I would have dealt with it.

Seriously, if I'm being stupid, please tell me which e-mail I should
have sent to whom or what to have done differnently.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 17:47, Borislav Petkov ha scritto:
> On Thu, Jul 31, 2014 at 09:47:12AM +, Igor Mammedov wrote:
>> Due to erratum #778 from
>> "Revision Guide for AMD Family 15h Models 10h-1Fh Processors,
>>  Publication # 48931, Issue Date: May 2013, Revision: 3.10"
>>
>> TSC on affected processor, a core may drift under certain conditions,
>> which makes initially synchronized TSCs to become unsynchronized.
> 
> Is this something you're seeing on a real system? If so, how do you
> trigger this?

http://thread.gmane.org/gmane.linux.kernel/1748516 says that Ingo's
time-warp-test fails miserably on this machine.

(The test is at
http://people.redhat.com/mingo/time-warp-test/time-warp-test.c)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/4] x86/kvm: Resolve some missing-initializers warnings

2014-07-31 Thread Rustad, Mark D
On Jul 31, 2014, at 4:50 AM, Paolo Bonzini  wrote:

> Il 30/07/2014 23:18, Mark D Rustad ha scritto:
>> Resolve some missing-initializers warnings that appear in W=2
>> builds. They are resolved by adding the name as a parameter to
>> the macros and having the macro generate all four fields of the
>> structure.
>> 
>> Signed-off-by: Mark Rustad 
>> Signed-off-by: Jeff Kirsher 
>> 
>> ---
>> V2: Change macro to supply all four fields instead of using a
>>designated initializer. Also fix up the array terminator.
>> ---
>> arch/x86/kvm/x86.c |   71 
>> ++--
>> 1 file changed, 36 insertions(+), 35 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index ef432f891d30..623aea52ceba 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -82,8 +82,9 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | 
>> EFER_LME | EFER_LMA));
>> static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>> #endif
>> 
>> -#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
>> -#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
>> +#define VM_STAT(name, x) name, offsetof(struct kvm, stat.x), KVM_STAT_VM, 
>> NULL
>> +#define VCPU_STAT(name, x) name, offsetof(struct kvm_vcpu, stat.x), \
>> +   KVM_STAT_VCPU, NULL
>> 
>> static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>> static void process_nmi(struct kvm_vcpu *vcpu);
>> @@ -128,39 +129,39 @@ static struct kvm_shared_msrs_global __read_mostly 
>> shared_msrs_global;
>> static struct kvm_shared_msrs __percpu *shared_msrs;
>> 
>> struct kvm_stats_debugfs_item debugfs_entries[] = {
>> -{ "pf_fixed", VCPU_STAT(pf_fixed) },
>> -{ "pf_guest", VCPU_STAT(pf_guest) },
>> -{ "tlb_flush", VCPU_STAT(tlb_flush) },
>> -{ "invlpg", VCPU_STAT(invlpg) },
>> -{ "exits", VCPU_STAT(exits) },
>> -{ "io_exits", VCPU_STAT(io_exits) },
>> -{ "mmio_exits", VCPU_STAT(mmio_exits) },
>> -{ "signal_exits", VCPU_STAT(signal_exits) },
>> -{ "irq_window", VCPU_STAT(irq_window_exits) },
>> -{ "nmi_window", VCPU_STAT(nmi_window_exits) },
>> -{ "halt_exits", VCPU_STAT(halt_exits) },
>> -{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
>> -{ "hypercalls", VCPU_STAT(hypercalls) },
>> -{ "request_irq", VCPU_STAT(request_irq_exits) },
>> -{ "irq_exits", VCPU_STAT(irq_exits) },
>> -{ "host_state_reload", VCPU_STAT(host_state_reload) },
>> -{ "efer_reload", VCPU_STAT(efer_reload) },
>> -{ "fpu_reload", VCPU_STAT(fpu_reload) },
>> -{ "insn_emulation", VCPU_STAT(insn_emulation) },
>> -{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
>> -{ "irq_injections", VCPU_STAT(irq_injections) },
>> -{ "nmi_injections", VCPU_STAT(nmi_injections) },
>> -{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
>> -{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
>> -{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
>> -{ "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) },
>> -{ "mmu_flooded", VM_STAT(mmu_flooded) },
>> -{ "mmu_recycled", VM_STAT(mmu_recycled) },
>> -{ "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
>> -{ "mmu_unsync", VM_STAT(mmu_unsync) },
>> -{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
>> -{ "largepages", VM_STAT(lpages) },
>> -{ NULL }
>> +{ VCPU_STAT("pf_fixed", pf_fixed) },
>> +{ VCPU_STAT("pf_guest", pf_guest) },
>> +{ VCPU_STAT("tlb_flush", tlb_flush) },
>> +{ VCPU_STAT("invlpg", invlpg) },
>> +{ VCPU_STAT("exits", exits) },
>> +{ VCPU_STAT("io_exits", io_exits) },
>> +{ VCPU_STAT("mmio_exits", mmio_exits) },
>> +{ VCPU_STAT("signal_exits", signal_exits) },
>> +{ VCPU_STAT("irq_window", irq_window_exits) },
>> +{ VCPU_STAT("nmi_window", nmi_window_exits) },
>> +{ VCPU_STAT("halt_exits", halt_exits) },
>> +{ VCPU_STAT("halt_wakeup", halt_wakeup) },
>> +{ VCPU_STAT("hypercalls", hypercalls) },
>> +{ VCPU_STAT("request_irq", request_irq_exits) },
>> +{ VCPU_STAT("irq_exits", irq_exits) },
>> +{ VCPU_STAT("host_state_reload", host_state_reload) },
>> +{ VCPU_STAT("efer_reload", efer_reload) },
>> +{ VCPU_STAT("fpu_reload", fpu_reload) },
>> +{ VCPU_STAT("insn_emulation", insn_emulation) },
>> +{ VCPU_STAT("insn_emulation_fail", insn_emulation_fail) },
>> +{ VCPU_STAT("irq_injections", irq_injections) },
>> +{ VCPU_STAT("nmi_injections", nmi_injections) },
>> +{ VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped) },
>> +{ VM_STAT("mmu_pte_write", mmu_pte_write) },
>> +{ VM_STAT("mmu_pte_updated", mmu_pte_updated) },
>> +{ VM_STAT("mmu_pde_zapped", mmu_pde_zapped) },
>> +{ VM_STAT("mmu_flooded", mmu_flooded) },
>> +{ VM_STAT("mmu_recycled", mmu_recycled) },
>> +{ VM_STAT("mmu_cache_miss", mmu_cache_miss) },
>> +{ VM_STAT("mmu_unsync", mmu_unsync) },
>> +{ VM_STAT("remote_tlb_flush", remote_tlb_flush) },
>> +{ VM_STAT("l

Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Christoffer Dall
On Thu, Jul 31, 2014 at 04:14:51PM +0100, Alex Bennée wrote:
> 
> Christoffer Dall writes:
> 
> > On Wed, Jul 09, 2014 at 02:55:12PM +0100, Alex Bennée wrote:
> >> To cleanly restore an SMP VM we need to ensure that the current pause
> >> state of each vcpu is correctly recorded. Things could get confused if
> >> the CPU starts running after migration restore completes when it was
> >> paused before it state was captured.
> >> 
> 
> >> +/* Power state (PSCI), not real registers */
> >> +#define KVM_REG_ARM_PSCI  (0x0014 << KVM_REG_ARM_COPROC_SHIFT)
> >> +#define KVM_REG_ARM_PSCI_REG(n) \
> >> +  (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_PSCI | \
> >> + (n & ~KVM_REG_ARM_COPROC_MASK))
> >
> > I don't understand this mask, why isn't this
> > (n & 0x))
> 
> I was trying to use the existing masks, but of course if anyone changes
> that it would be an ABI change so probably not worth it.
> 

the KVM_REG_ARM_COPROC_MASK is part of the uapi IIRC, so that's not the
issue, but that mask doesn't cover all the upper bits, so it feels weird
to use that to me.

> >
> >> +#define KVM_REG_ARM_PSCI_STATE  KVM_REG_ARM_PSCI_REG(0)
> >> +#define NUM_KVM_PSCI_REGS   1
> >> +
> >
> > you're missing updates to Documentation/virtual/kvm/api.txt here.
> 
> Will add.
> 
> >>  /* Device Control API: ARM VGIC */
> >>  #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> >>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS1
> >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >> index 205f0d8..31d6439 100644
> >> --- a/arch/arm64/kvm/guest.c
> >> +++ b/arch/arm64/kvm/guest.c
> >> @@ -189,6 +189,54 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const 
> >> struct kvm_one_reg *reg)
> >>  }
> >>  
> >>  /**
> >> + * PSCI State
> >> + *
> >> + * These are not real registers as they do not actually exist in the
> >> + * hardware but represent the current power state of the vCPU
> >
> > full stop
> >
> >> + */
> >> +
> >> +static bool is_psci_reg(u64 index)
> >> +{
> >> +  switch (index) {
> >> +  case KVM_REG_ARM_PSCI_STATE:
> >> +  return true;
> >> +  }
> >> +  return false;
> >> +}
> >> +
> >> +static int copy_psci_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> >> +{
> >> +  if (put_user(KVM_REG_ARM_PSCI_STATE, uindices))
> >> +  return -EFAULT;
> >> +  return 0;
> >> +}
> >> +
> >> +static int set_psci_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg 
> >> *reg)
> >> +{
> >> +  void __user *uaddr = (void __user *)(long)reg->addr;
> >> +  u64 val;
> >> +  int ret;
> >> +
> >> +  ret = copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id));
> >> +  if (ret != 0)
> >> +  return ret;
> >> +
> >> +vcpu->arch.pause = (val & 0x1) ? false : true;
> >
> > tabs
> 
> Hmmm, apparently the GNU Emacs "linux" style doesn't actually enforce
> that. Who knew? I'll beat the editor into submission.
> 
> > I really need the documentation of the ABI, why is bit[0] == 1 not
> > paused?
> 
> I figured 1 == running, but I can switch it round if you want to to map
> directly to the .pause flag.
> 

don't care very deeply, just want something to describe it clearly.

> > If we are not complaining when setting the pause value to false if it
> > was true before, then we probably also need to wake up the thread in
> > case this is called from another thread, right?
> >
> > or perhaps we should just return an error if you're trying to un-pause a
> > CPU through this interface, h.
> 
> Wouldn't it be an error to mess with any register when the system is not
> in a quiescent state? I was assuming that the wake state is dealt with
> when the run loop finally restarts. 
> 

The ABI doesn't really define it as an error (the ABI doesn't enforce
anything right now) so the question is, does it ever make sense to clear
the pause flag through this ioctl?  If not, I think we should just err
on the side of caution and specify in the docs that this is not
supported and return an error.


> 
> >
> > please check for use of tabs versus spaces, checkpatch.pl should
> > complain.
> >
> > Can you add the 32-bit counterpart as part of this patch?
> 
> Same patch? Sure.

really up to you if you want to split it up into two patches, but I
think it's small enough that you can just create one patch.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Peter Maydell
On 31 July 2014 17:38, Christoffer Dall  wrote:
>> > If we are not complaining when setting the pause value to false if it
>> > was true before, then we probably also need to wake up the thread in
>> > case this is called from another thread, right?
>> >
>> > or perhaps we should just return an error if you're trying to un-pause a
>> > CPU through this interface, h.
>>
>> Wouldn't it be an error to mess with any register when the system is not
>> in a quiescent state? I was assuming that the wake state is dealt with
>> when the run loop finally restarts.
>>
>
> The ABI doesn't really define it as an error (the ABI doesn't enforce
> anything right now) so the question is, does it ever make sense to clear
> the pause flag through this ioctl?  If not, I think we should just err
> on the side of caution and specify in the docs that this is not
> supported and return an error.

Consider the case where the reset state of the system is
"CPU 0 running, CPUs 1..N stopped", and we're doing an
incoming migration to a state where all CPUs are running.
In that case we'll be using this ioctl to clear the pause flag,
right? (We'll also obviously need to set the PC and other
register state correctly before resuming the guest.)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Christoffer Dall
On Thu, Jul 31, 2014 at 05:45:28PM +0100, Peter Maydell wrote:
> On 31 July 2014 17:38, Christoffer Dall  wrote:
> >> > If we are not complaining when setting the pause value to false if it
> >> > was true before, then we probably also need to wake up the thread in
> >> > case this is called from another thread, right?
> >> >
> >> > or perhaps we should just return an error if you're trying to un-pause a
> >> > CPU through this interface, h.
> >>
> >> Wouldn't it be an error to mess with any register when the system is not
> >> in a quiescent state? I was assuming that the wake state is dealt with
> >> when the run loop finally restarts.
> >>
> >
> > The ABI doesn't really define it as an error (the ABI doesn't enforce
> > anything right now) so the question is, does it ever make sense to clear
> > the pause flag through this ioctl?  If not, I think we should just err
> > on the side of caution and specify in the docs that this is not
> > supported and return an error.
> 
> Consider the case where the reset state of the system is
> "CPU 0 running, CPUs 1..N stopped", and we're doing an
> incoming migration to a state where all CPUs are running.
> In that case we'll be using this ioctl to clear the pause flag,
> right? (We'll also obviously need to set the PC and other
> register state correctly before resuming the guest.)
> 
Doh, you're right, I somehow had it in my mind that when you send the
thread a signal, the pause flag would be cleared, but that goes against
the whole idea of a CPU being turned off for KVM.

But wouldn't we then have to also wake up the thread when clearing the
pause flag?  It feels strange that the ioctl can clear the pause flag,
but keep the thread on a wake-queue, and then userspace has to send the
thread a signal of some sort to wake it up?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Peter Maydell
On 31 July 2014 17:50, Christoffer Dall  wrote:
> On Thu, Jul 31, 2014 at 05:45:28PM +0100, Peter Maydell wrote:
>> Consider the case where the reset state of the system is
>> "CPU 0 running, CPUs 1..N stopped", and we're doing an
>> incoming migration to a state where all CPUs are running.
>> In that case we'll be using this ioctl to clear the pause flag,
>> right? (We'll also obviously need to set the PC and other
>> register state correctly before resuming the guest.)
>>
> Doh, you're right, I somehow had it in my mind that when you send the
> thread a signal, the pause flag would be cleared, but that goes against
> the whole idea of a CPU being turned off for KVM.
>
> But wouldn't we then have to also wake up the thread when clearing the
> pause flag?  It feels strange that the ioctl can clear the pause flag,
> but keep the thread on a wake-queue, and then userspace has to send the
> thread a signal of some sort to wake it up?

I have no idea about the implementation, I just know what
the user-facing ABI ought to look like. In particular userspace
definitely shouldn't have to send the thread any kind of
signal, it should just use KVM_RUN as usual and that should
cause the vCPU to either remain powered-down or start
executing code, as appropriate for the state we've just set.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/4] x86/kvm: Resolve some missing-initializers warnings

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 18:35, Rustad, Mark D ha scritto:
> I agree it is ugly. .name = NULL would be enough to silence it. Would
> that be better? At the moment I am thinking of this as a test case
> for the other 1000 { } and {0} initializers in the kernel that are
> throwing warnings. I know we both agree that the compiler really
> shouldn't be warning on them, but they currently make a lot noise.
> 
> How would you feel about a macro called something like ZERO_ENTRY
> defined something like:
> 
> #define ZERO_ENTRY DIAG_PUSH DIAG_IGNORE(missing-field-initializers)
> { } DIAG_POP
> 
> Where the DIAG_ macros pretty much do what you think. I have another
> patch series that Jeff hasn't gotten to yet that defines such macros.
> Of course they get put to good use.
> 
> At this point, I'll put the terminator back the way it was, but I
> would still like your opinion on the macro approach to addressing all
> of these terminators.

If you get such a macro in include/linux, I will of course accept its usage.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Paolo Bonzini
Il 09/07/2014 15:55, Alex Bennée ha scritto:
> To cleanly restore an SMP VM we need to ensure that the current pause
> state of each vcpu is correctly recorded. Things could get confused if
> the CPU starts running after migration restore completes when it was
> paused before it state was captured.
> 
> I've done this by exposing a register (currently only 1 bit used) via
> the GET/SET_ONE_REG logic to pass the state between KVM and the VM
> controller (e.g. QEMU).
> 
> Signed-off-by: Alex Bennée 
> ---
>  arch/arm64/include/uapi/asm/kvm.h |  8 +
>  arch/arm64/kvm/guest.c| 61 
> ++-
>  2 files changed, 68 insertions(+), 1 deletion(-)

Since it's a pseudo register anyway, would it make sense to use the
existing KVM_GET/SET_MP_STATE ioctl interface?

How is this represented within QEMU in TCG mode?  Also, how is KVM/ARM
representing (and passing to QEMU) the halted state of the VCPU?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Peter Maydell
On 31 July 2014 17:57, Paolo Bonzini  wrote:
> Il 09/07/2014 15:55, Alex Bennée ha scritto:
>> To cleanly restore an SMP VM we need to ensure that the current pause
>> state of each vcpu is correctly recorded. Things could get confused if
>> the CPU starts running after migration restore completes when it was
>> paused before it state was captured.
>>
>> I've done this by exposing a register (currently only 1 bit used) via
>> the GET/SET_ONE_REG logic to pass the state between KVM and the VM
>> controller (e.g. QEMU).
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  arch/arm64/include/uapi/asm/kvm.h |  8 +
>>  arch/arm64/kvm/guest.c| 61 
>> ++-
>>  2 files changed, 68 insertions(+), 1 deletion(-)
>
> Since it's a pseudo register anyway, would it make sense to use the
> existing KVM_GET/SET_MP_STATE ioctl interface?

That appears to be an x86-specific thing relating to
IRQ chips.

> How is this represented within QEMU in TCG mode?

We don't implement it in TCG yet; Rob Herring has posted
patches but they had a few minor issues (didn't compile
on non-Linux hosts). The answer will be 'in a "bool powered_off"
flag in struct ARMCPU'.

> Also, how is KVM/ARM
> representing (and passing to QEMU) the halted state of the
> VCPU?

We don't. In ARM the equivalent of x86 HLT (which is
WFI, wait-for-interrupt) is allowed to resume at any time.
So we don't need to care about saving and restoring
whether we were sat in a WFI at point of migration.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/4] x86/kvm: Resolve shadow warnings in macro expansion

2014-07-31 Thread Mark D Rustad
Resolve shadow warnings that appear in W=2 builds. Instead of
using ret to hold the return pointer, save the length in a new
variable saved_len and compute the pointer on exit. This also
resolves a very technical error, in that ret was declared as
a const char *, when it really was a char * const, which
theoretically could have allowed the compiler to do something
wrong.

Signed-off-by: Mark Rustad 
Signed-off-by: Jeff Kirsher 

---
Changes in V2:
- Instead of renaming all inner variables, just delete the
  ret variable in favor of the new saved_len variable.
---
 arch/x86/kvm/mmutrace.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 9d2e0ffcb190..5aaf35641768 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -22,7 +22,7 @@
__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({   \
-   const char *ret = p->buffer + p->len;   \
+   const u32 saved_len = p->len;   \
static const char *access_str[] = { \
"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
};  \
@@ -41,7 +41,7 @@
 role.nxe ? "" : "!",   \
 __entry->root_count,   \
 __entry->unsync ? "unsync" : "sync", 0);   \
-   ret;\
+   p->buffer + saved_len;  \
})
 
 #define kvm_mmu_trace_pferr_flags   \

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 19:04, Peter Maydell ha scritto:
> On 31 July 2014 17:57, Paolo Bonzini  wrote:
>> Il 09/07/2014 15:55, Alex Bennée ha scritto:
>>> To cleanly restore an SMP VM we need to ensure that the current pause
>>> state of each vcpu is correctly recorded. Things could get confused if
>>> the CPU starts running after migration restore completes when it was
>>> paused before it state was captured.
>>>
>>> I've done this by exposing a register (currently only 1 bit used) via
>>> the GET/SET_ONE_REG logic to pass the state between KVM and the VM
>>> controller (e.g. QEMU).
>>>
>>> Signed-off-by: Alex Bennée 
>>> ---
>>>  arch/arm64/include/uapi/asm/kvm.h |  8 +
>>>  arch/arm64/kvm/guest.c| 61 
>>> ++-
>>>  2 files changed, 68 insertions(+), 1 deletion(-)
>>
>> Since it's a pseudo register anyway, would it make sense to use the
>> existing KVM_GET/SET_MP_STATE ioctl interface?
> 
> That appears to be an x86-specific thing relating to
> IRQ chips.

No, it's not.  It's just the state of the CPU, s390 will be using it too.

On x86 the states are uninitialized (UNINITIALIZED), stopped
(INIT_RECEIVED), running (RUNNABLE), halted (HALTED).  CPU 0 starts in
RUNNABLE state, other CPUs start in UNINITIALIZED state.  There are
x86-specific cases (uninitialized) and x86-isms (the INIT_RECEIVED
name), but the idea is widely applicable.

>> Also, how is KVM/ARM
>> representing (and passing to QEMU) the halted state of the
>> VCPU?
> 
> We don't. In ARM the equivalent of x86 HLT (which is
> WFI, wait-for-interrupt) is allowed to resume at any time.
> So we don't need to care about saving and restoring
> whether we were sat in a WFI at point of migration.

What does ARM do if you have a WFI while interrupts are disabled?  On
x86 after "cli;hlt" only an NMI will wake you up.  With spurious
wakeups, it's pretty much guaranteed that you will break such "cli;hlt"
sequences.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Peter Maydell
On 31 July 2014 18:21, Paolo Bonzini  wrote:
> What does ARM do if you have a WFI while interrupts are disabled?  On
> x86 after "cli;hlt" only an NMI will wake you up.  With spurious
> wakeups, it's pretty much guaranteed that you will break such "cli;hlt"
> sequences.

The architecture mandates some things that *must* wake you from
a WFI, but it also allows wakeups for other reasons not listed, or
for no reason at all. It's perfectly valid to implement WFI as a NOP
(though it would not be very good for power efficiency, obviously).
Guests which don't surround WFI with a "check whether we should
just go back to WFI" loop are buggy.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: export current vcpu->pause state via pseudo regs

2014-07-31 Thread Will Deacon
On Thu, Jul 31, 2014 at 06:36:35PM +0100, Peter Maydell wrote:
> On 31 July 2014 18:21, Paolo Bonzini  wrote:
> > What does ARM do if you have a WFI while interrupts are disabled?  On
> > x86 after "cli;hlt" only an NMI will wake you up.  With spurious
> > wakeups, it's pretty much guaranteed that you will break such "cli;hlt"
> > sequences.
> 
> The architecture mandates some things that *must* wake you from
> a WFI, but it also allows wakeups for other reasons not listed, or
> for no reason at all. It's perfectly valid to implement WFI as a NOP
> (though it would not be very good for power efficiency, obviously).
> Guests which don't surround WFI with a "check whether we should
> just go back to WFI" loop are buggy.

(and in case that wasn't clear, local_irq_disable() doesn't prevent an
interrupt from waking you up from wfi, otherwise our idle code would be
broken).

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Remove redundant and incorrect cpl check on task-switch

2014-07-31 Thread Nadav Amit
cc'ing the kvm mailing list that was mistakenly omitted.

On 7/30/14 9:57 AM, Nadav Amit wrote:
> Task-switch emulation checks the privilage level prior to performing the
> task-switch.  This check is incorrect in the case of task-gates, in which the
> tss.dpl is ignored, and can cause superfluous exceptions.  Moreover this check
> is unnecassary, since the CPU checks the privilage levels prior to exiting.
> Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly
> outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior
> to exiting.  AMD 15.14.1 says "The intercept is checked before the task switch
> takes place but after the incoming TSS and task gate (if one was involved) 
> have
> been checked for correctness."
> 
> This patch removes the CPL checks for CALL and JMP.
> 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/emulate.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 56657b0..a9b2bd6 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2698,7 +2698,8 @@ static int emulator_do_task_switch(struct 
> x86_emulate_ctxt *ctxt,
>*
>* 1. jmp/call/int to task gate: Check against DPL of the task gate
>* 2. Exception/IRQ/iret: No check is performed
> -  * 3. jmp/call to TSS: Check against DPL of the TSS
> +  * 3. jmp/call to TSS/task-gate: No check is performed since the
> +  *hardware checks it before exiting.
>*/
>   if (reason == TASK_SWITCH_GATE) {
>   if (idt_index != -1) {
> @@ -2715,13 +2716,8 @@ static int emulator_do_task_switch(struct 
> x86_emulate_ctxt *ctxt,
>   if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
>   return emulate_gp(ctxt, (idt_index << 3) | 0x2);
>   }
> - } else if (reason != TASK_SWITCH_IRET) {
> - int dpl = next_tss_desc.dpl;
> - if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
> - return emulate_gp(ctxt, tss_selector);
>   }
>  
> -
>   desc_limit = desc_limit_scaled(&next_tss_desc);
>   if (!next_tss_desc.p ||
>   ((desc_limit < 0x67 && (next_tss_desc.type & 8)) ||
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: cluster mode broadcast does not work

2014-07-31 Thread Nadav Amit
cc'ing the kvm mailing list that was mistakenly omitted.

On 7/30/14 10:03 AM, Nadav Amit wrote:
> Local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says:
> "Broadcast to all local APICs is achieved by setting all destination bits to
> one." This patch enables cluster mode broadcast.
> 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/lapic.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index efb1939..05c9a46 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -560,8 +560,8 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, 
> u8 mda)
>   result = 1;
>   break;
>   case APIC_DFR_CLUSTER:
> - if (((logical_id >> 4) == (mda >> 0x4))
> - && (logical_id & mda & 0xf))
> + if logical_id >> 4) == (mda >> 0x4))
> + && (logical_id & mda & 0xf)) || mda == 0xff)
>   result = 1;
>   break;
>   default:
> @@ -644,6 +644,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> + if (irq->dest_id == 0xff &&
> + kvm_apic_get_reg(src, APIC_DFR) == APIC_DFR_CLUSTER)
> + goto out; /* cluster mode broadcast */
> +
>   dst = map->logical_map[apic_cluster_id(map, mda)];
>  
>   bitmap = apic_logical_id(map, mda);
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifying Execution Integrity in Untrusted hypervisors

2014-07-31 Thread Shiva V
Jan Kiszka  siemens.com> writes:

> 
> On 2014-07-28 23:17, Nakajima, Jun wrote:
> > On Mon, Jul 28, 2014 at 1:27 PM, Paolo Bonzini  
redhat.com> wrote:
> >> Il 28/07/2014 20:31, Jan Kiszka ha scritto:
> >>> The hypervisor has full control of and insight into the guest vCPU
> >>> state. Only protecting some portions of guest memory seems 
insufficient.
> >>>
> >>> We rather need encryption of every data that leaves the CPU or moves
> >>> from guest to host mode (and decryption the other way around). I guess
> >>> that would have quite some performance impact and is far from being 
easy
> >>> to integrate into modern processors. But, who knows...
> >>
> >> Intel SGX sounds somewhat like what you describe, but I'm not sure how
> >> it's going to be virtualized.
> >>
> > 
> > Right. It's possible to virtualize (or pass-through) SGX without
> > losing the security feature.
> 
> Interesting thing. Somehow missed this so far. Fairly complicated one,
> though. Still trying to wrap my head around how attestation practically
> works.
> 
> > With SGX, you can create secure (encrypted) islands on processes in
> > VMs as well. But I'm not sure if it's useful for solving the problem
> > described.
> 
> Huh? I thought remote attestation is a key feature of SGX? That is, to
> my understanding, what Shiva is looking for (though on current hardware,
> which remains infeasible unfortunately).
> 
> Jan
> 

I was going through the Write xor Execute memory protection scheme and 
thought if this could be the solution.

I think if we lock down the code pages of the hypervisor and corresponding 
to the memory pages and then allow the handler to temporary unlock. 
(Assuming the operation is atomic). 

But I dont the security threats associated here. Can anyone point me in 
right direction?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang after seabios

2014-07-31 Thread Zetan Drableg
Hi Richard thanks for the info.
I took the strace approach and ran into this looping over and over again.
Is it failing to get time?

timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0x8, 0, {it_interval={0, 0}, it_value={0, 25}}, NULL) = 0
timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 204443}}) = 0
select(16, [0 6 9 13 15], [], [], {1, 0}) = 2 (in [6 13], left {0, 98})
read(13, "\1\0\0\0\0\0\0\0", 4096)  = 8
read(13, 0x7fffa2ed3f70, 4096)  = -1 EAGAIN (Resource
temporarily unavailable)
read(6, "\0", 512)  = 1
read(6, 0x7fffa2ed4d70, 512)= -1 EAGAIN (Resource
temporarily unavailable)
select(16, [0 6 9 13 15], [], [], {1, 0}) = 1 (in [15], left {0, 98})
read(15, 
"\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0"...,
128) = 128
rt_sigaction(SIGALRM, NULL, {0x7f6d8b8d17d0, ~[KILL STOP RTMIN RT_1],
SA_RESTORER, 0x7f6d8b211710}, 8) = 0
write(7, "\0", 1)   = 1
write(14, "\1\0\0\0\0\0\0\0", 8)= 8
read(15, 0x7fffa2ed4ee0, 128)   = -1 EAGAIN (Resource
temporarily unavailable)
timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0x8, 0, {it_interval={0, 0}, it_value={0, 988758000}}, NULL) = 0
select(16, [0 6 9 13 15], [], [], {1, 0}) = 2 (in [6 13], left {0, 98})
read(13, "\1\0\0\0\0\0\0\0", 4096)  = 8
read(13, 0x7fffa2ed3f70, 4096)  = -1 EAGAIN (Resource
temporarily unavailable)
read(6, "\0", 512)  = 1
read(6, 0x7fffa2ed4d70, 512)= -1 EAGAIN (Resource
temporarily unavailable)

It looks a lot like the bug you filed here.
https://bugzilla.redhat.com/show_bug.cgi?id=553689

There are quite a few hits on this trace but nothing real specific.

-Zetan



On Thu, Jul 31, 2014 at 4:31 AM, Richard W.M. Jones  wrote:
> On Wed, Jul 30, 2014 at 03:58:43PM -0700, Zetan Drableg wrote:
>>[00183ms] /usr/libexec/qemu-kvm \
>>-global virtio-blk-pci.scsi=off \
>>-nodefconfig \
>>-nodefaults \
>>-nographic \
>>-machine accel=kvm:tcg \
>>-cpu host,+kvmclock \
>>-m 500 \
>>-no-reboot \
>>-kernel /var/tmp/.guestfs-0/kernel.47903 \
>>-initrd /var/tmp/.guestfs-0/initrd.47903 \
>>-device virtio-scsi-pci,id=scsi \
>>-drive file=/tmp/libguestfs-test-tool-sda-Iakpwe,cache=none,format
>>=raw,id=hd0,if=none \
>>-device scsi-hd,drive=hd0 \
>>-drive file=/var/tmp/.guestfs-0/root.47903,snapshot=on,id=appliance,
>>if=none,cache=unsafe \
>>-device scsi-hd,drive=appliance \
>>-device virtio-serial \
>>-serial stdio \
>>-device sga \
>>-chardev socket,path=/tmp/libguestfspx9994/guestfsd.sock,id=channel0
>>\
>>-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0
>>\
>>-append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check
>>acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0
>>guestfs_verbose=1 TERM=xterm edd=off'
>>\x1b[1;256r\x1b[256;256H\x1b[6n
>>Google, Inc.
>>Serial Graphics Adapter 10/14/11
>>SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@
>>ca-build44.us.oracle.com) Fri Oct 14 20:04:36 UTC 2011
>>Term: 80x24
>>4 0
>>SeaBIOS (version seabios-0.6.1.2-28.el6)
>>\x1b[2J
>>
>> At this point it hangs forever.
>
> These hangs can be tricky to diagnose.
>
> There are a couple of things you can do however:
>
> (1) strace qemu to find out what it is doing, or:
>
> (2) [harder, but much more informative] gdb into the guest to find out
> where the guest hangs, or:
>
> (3) Take the command line above, and cut it down to try to isolate the
> problematic options.  I would concentrate on the following options as
> being most likely to cause trouble:
>
>  -cpu
>  -machine
>  -kernel
>  -device sga (remove it)
>
> For (1) and (2) you can use a qemu wrapper to modify the qemu command
> that the test tool runs.
>
> See also:
>
> http://libguestfs.org/guestfs.3.html#qemu-wrappers
>
> http://rwmj.wordpress.com/2011/10/12/tip-debugging-the-early-boot-process-with-qemu-and-gdb/#content
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-builder quickly builds VMs from scratch
> http://libguestfs.org/virt-builder.1.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang after seabios

2014-07-31 Thread Richard W.M. Jones
On Thu, Jul 31, 2014 at 12:12:26PM -0700, Zetan Drableg wrote:
> Hi Richard thanks for the info.
> I took the strace approach and ran into this looping over and over again.
> Is it failing to get time?
> 
> timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 0}}) = 0
> timer_settime(0x8, 0, {it_interval={0, 0}, it_value={0, 25}}, NULL) = 0
> timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 204443}}) = 0
> select(16, [0 6 9 13 15], [], [], {1, 0}) = 2 (in [6 13], left {0, 98})
> read(13, "\1\0\0\0\0\0\0\0", 4096)  = 8
> read(13, 0x7fffa2ed3f70, 4096)  = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, "\0", 512)  = 1
> read(6, 0x7fffa2ed4d70, 512)= -1 EAGAIN (Resource
> temporarily unavailable)
> select(16, [0 6 9 13 15], [], [], {1, 0}) = 1 (in [15], left {0, 98})
> read(15, 
> "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0"...,
> 128) = 128
> rt_sigaction(SIGALRM, NULL, {0x7f6d8b8d17d0, ~[KILL STOP RTMIN RT_1],
> SA_RESTORER, 0x7f6d8b211710}, 8) = 0
> write(7, "\0", 1)   = 1
> write(14, "\1\0\0\0\0\0\0\0", 8)= 8
> read(15, 0x7fffa2ed4ee0, 128)   = -1 EAGAIN (Resource
> temporarily unavailable)
> timer_gettime(0x8, {it_interval={0, 0}, it_value={0, 0}}) = 0
> timer_settime(0x8, 0, {it_interval={0, 0}, it_value={0, 988758000}}, NULL) = 0
> select(16, [0 6 9 13 15], [], [], {1, 0}) = 2 (in [6 13], left {0, 98})
> read(13, "\1\0\0\0\0\0\0\0", 4096)  = 8
> read(13, 0x7fffa2ed3f70, 4096)  = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, "\0", 512)  = 1
> read(6, 0x7fffa2ed4d70, 512)= -1 EAGAIN (Resource
> temporarily unavailable)
> 
> It looks a lot like the bug you filed here.
> https://bugzilla.redhat.com/show_bug.cgi?id=553689

I'm fairly sure this is just qemu running as normal.  It's not a
duplicate of that ancient bug, because you can see from the messages
that SeaBIOS is running.

You need to `gdb' into the guest to see where the emulation got to.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] arm64: bump MAX_MASTER_STREAMIDS from 16 to 32

2014-07-31 Thread Joel Schopp
I recently ran into a situation where I needed more than 16 stream ids
for an smmu on an AMD SOC, but we are currently limited to 16 by:

#define MAX_MASTER_STREAMIDSMAX_PHANDLE_ARGS
#define MAX_PHANDLE_ARGS 16

I expect others will run into this in the future as more advanced SOCs
start to come out. The resulting one line patch has been tested to fix
the problem.

Cc: Grant Likely 
Cc: Rob Herring 
Cc: Will Deacon 
Signed-off-by: Joel Schopp 
---
 include/linux/of.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/of.h b/include/linux/of.h
index 196b34c..70ad4d9 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -67,7 +67,7 @@ struct device_node {
 #endif
 };
 
-#define MAX_PHANDLE_ARGS 16
+#define MAX_PHANDLE_ARGS 32
 struct of_phandle_args {
struct device_node *np;
int args_count;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-31 Thread Scott Wood
On Thu, 2014-07-31 at 01:15 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Thursday, July 31, 2014 8:18 AM
> > To: Bhushan Bharat-R65777
> > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder 
> > Stuart-
> > B08248
> > Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and 
> > exception
> > 
> > On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote:
> > >
> > > > -Original Message-
> > > > From: Wood Scott-B07421
> > > > Sent: Tuesday, July 29, 2014 3:58 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org;
> > > > Yoder Stuart-
> > > > B08248
> > > > Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers
> > > > and exception
> > > >
> > > >  Userspace might be interested in
> > > > the raw value,
> > >
> > > With the current design, If userspace is interested then it will not
> > > get the DBSR.
> > 
> > Oh, because DBSR isn't currently implemented in sregs or one reg?
> 
> That is one reason. Another is that if we give dbsr visibility to
> userspace then userspace have to clear dbsr in handling KVM_EXIT_DEBUG.

Right -- since I didn't realize DBSR wasn't already exposed, I thought
userspace already had this responsibility.

> > It looked like it was removing dbsr visibility and the requirement for 
> > userspace
> > to clear dbsr.  I guess the old way was that the value in
> > vcpu->arch.dbsr didn't matter until the next debug exception, when it
> > would be overwritten by the new SPRN_DBSR?
> 
> But that means old dbsr will be visibility to userspace, which is even bad 
> than not visible, no?
> 
> Also this can lead to old dbsr visible to guest once userspace releases
> debug resources, but this can be solved by clearing dbsr in
> kvm_arch_vcpu_ioctl_set_guest_debug() -> " if (!(dbg->control &
> KVM_GUESTDBG_ENABLE)) { }".

I wasn't suggesting that you keep it that way, just clarifying my
understanding of the current code.


> 
> > 
> > > > > + case SPRN_DBCR2:
> > > > > + /*
> > > > > +  * If userspace is debugging guest then guest
> > > > > +  * can not access debug registers.
> > > > > +  */
> > > > > + if (vcpu->guest_debug)
> > > > > + break;
> > > > > +
> > > > > + debug_inst = true;
> > > > > + vcpu->arch.dbg_reg.dbcr2 = spr_val;
> > > > > + vcpu->arch.shadow_dbg_reg.dbcr2 = spr_val;
> > > > >   break;
> > > >
> > > > In what circumstances can the architected and shadow registers differ?
> > >
> > > As of now they are same. But I think that if we want to implement other
> > features like "Freeze Timer (FT)" then they can be different.
> > 
> > I don't think we can possibly implement Freeze Timer.
> 
> May be, but in my opinion we should keep this open.

We're not talking about API here -- the implementation should be kept
simple if there's no imminent need for shadow registers.

> > > I am not sure what we should in that case ?
> > >
> > > As we are currently emulating a subset of debug events (IAC, DAC, IC,
> > > BT and TIE --- DBCR0 emulation) then we should expose status of those
> > > events in guest dbsr and rest should be cleared ?
> > 
> > I'm not saying they need to be exposed to the guest, but I don't see where 
> > you
> > filter out bits like these.
> 
> I am trying to get what all bits should be filtered out, all bits
> except IACx, DACx, IC, BT and TIE (same as event set filtering done
> when setting DBCR0) ? 
> 
> i.e IDE, UDE, MRR, IRPT, RET, CIRPT, CRET should be filtered out?

Bits like IRPT and RET don't really matter, as you shouldn't see them
happen.  Likewise MRR if you're sure you've cleared it since boot.  But
IDE could be set any time an asynchronous exception happens.  I don't
think you should filter it out, but instead make sure that it doesn't
cause an exception to be delivered.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Integrity in untrusted environments

2014-07-31 Thread Shiva V
Hello,
 I am exploring ideas to implement a service inside a virtual machine on 
untrusted hypervisors under current cloud infrastructures.
 Particularly, I am interested how one can verify the integrity of the 
service in an environment where hypervisor is not trusted. This is my setup.

1. I have two virtual machines. (Normal client VM's).
2. VM-A is executing a service and VM-B wants to verify its integrity.
3. Both are executing on untrusted hypervisor.

Though, Intel SGX will solve this, by using the concept of enclaves, its not 
publicly available yet.

One could also use SMM to verify the integrity. But since this is time based 
approach, one could easily exploit between the time window.

I was drilling down this idea, We know Write xor Execute Memory Protection 
Scheme. Using this idea,If we could lock down the VM-A memory pages where 
the service is running and also corresponding page-table entries, then have 
a handler code that temporarily unlocks them for legitimate updates, then 
one could verify the integrity of the service running. 

Since if attacker needs to inject some malicious code, he needs to update 
the page tables and if this is locked down, he will not be able to inject 
arbitrary code without notice. Since the unmodified hypervisor handler will 
not handle this situation.  

But here are my questions:

1. Is write xor execute feasible solution in cloud environments? Since this 
scheme fails or could be exploited if there is double mapping. So, what if 
there is mapping from different vm's to same physical memory? Will this 
fail?

2. what are the security threats involved if one proceeds with this scheme?

Any help in this regard is greatly appreciated.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrity in untrusted environments

2014-07-31 Thread Nakajima, Jun
On Thu, Jul 31, 2014 at 2:25 PM, Shiva V  wrote:
> Hello,
>  I am exploring ideas to implement a service inside a virtual machine on
> untrusted hypervisors under current cloud infrastructures.
>  Particularly, I am interested how one can verify the integrity of the
> service in an environment where hypervisor is not trusted. This is my setup.
>
> 1. I have two virtual machines. (Normal client VM's).
> 2. VM-A is executing a service and VM-B wants to verify its integrity.
> 3. Both are executing on untrusted hypervisor.
>
> Though, Intel SGX will solve this, by using the concept of enclaves, its not
> publicly available yet.

Just clarification. The concept of enclaves and the specs of Intel SGX
are available in public.

See the following, for example:
https://software.intel.com/en-us/intel-isa-extensions

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrity in untrusted environments

2014-07-31 Thread Shiva V
Nakajima, Jun  intel.com> writes:

 
> On Thu, Jul 31, 2014 at 2:25 PM, Shiva V  
gmail.com> wrote:
Hello,
 I am exploring ideas to implement a service inside a virtual machine on
 untrusted hypervisors under current cloud infrastructures.
Particularly, I am interested how one can verify the integrity of the
service in an environment where hypervisor is not trusted. This is my 
setup.

1. I have two virtual machines. (Normal client VM's).
2. VM-A is executing a service and VM-B wants to verify its integrity.
 3. Both are executing on untrusted hypervisor.

 Though, Intel SGX will solve this, by using the concept of enclaves, its 
not
 publicly available yet.

 Just clarification. The concept of enclaves and the specs of Intel SGX
are available in public.
 
> See the following, for example:
> https://software.intel.com/en-us/intel-isa-extensions

Thanks for the reply. By mentioning Not publicly available, 
I meant that the Intel SGX processors are not available in market yet.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-07-31:
> Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>>> One thing:
>>> 
 +  if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
 +  vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>> 
>>> I think you can just do this write unconditionally, since most
>>> hypervisors will enable this.  Also, you probably can add the tpr
>> 
>> What will happen if a hypervisor doesn't enable it? I make it more
>> cleaner in version two.
> 
> TPR_THRESHOLD will be likely written as zero, but the processor will
> never use it anyway.  It's just a small optimization because
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Theoretically, you are right. But we should not expect all VMMs follow it. It 
is not worth to violate the SDM just for saving two or three instructions' cost.

> 
> Paolo
> 
>>> threshold field to the read-write fields for shadow VMCS.
>> 
>> Agreed.
>> 
>> Regards,
>> Wanpeng Li


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Query: Is it possible to lose interrupts between vhost and virtio_net during migration?

2014-07-31 Thread Zhangjie (HZ)
Thanks,MST! :-)
I will change the order back and have a test again.

On 2014/7/31 22:37, Michael S. Tsirkin wrote:
> On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>> On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>> [The test scenario]:
>>>
>>> Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 
>>> times, network of the VM is unreachable.
>>> There are other 20 VMs in each Host, and they send ipv4 or ipv6 and 
>>> multicast packets to each other.
>>> Sometimes the CPU idle of the Host maybe 0;
>>>
>>> [Problem description]:
>>>
>>> I wonder if it was interrupts missing that cause the network unreachable.
>>> In the migration process of kvm, source end should suspend, which include 
>>> steps as follows:
>>> 1.  do_vm_stop->pause_all_vcpus
>>> 2.  vm_state_notify-> 
>>> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>> 3.  vm_state_notify-> vhost_net_stop-> 
>>> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> 
>>> vhost_work_flush
>>> This may cause interrupts missing. Supose the scene that, 
>>> virtqueue_notify() is called in virtio_net,
>>> then the VM is paused. And, just before the portiowrite being handled, 
>>> eventfd of kvm is released.
>>> Then, vhost could not sense the notify, and the tx notify is lost.
>>> On the other side, if eventfd of kvm is released just after vhost_notify(), 
>>> and before eventfd_signal(), then rx signal by vhost is lost.
>>
>> Could be a bug in userspace: should should cleanups notifiers
>> after it stops vhost.
>>
>> Could you please send this to appropriate mailing lists?
>> I have a policy against off-list discussions.
> 
> Also, Jason, could you take a look please?
> Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
> changed the order of stopping the device.
> Previously vhost_dev_stop would disable backend and only afterwards,
> unset guest notifiers.  You now unset guest notifiers while vhost is still
> active. Looks like this can lose events?
> 
> 
> 
>> -- 
>> MST
> .
> 

-- 
Best Wishes!
Zhang Jie

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-07-31 Thread Madhavan Srinivasan
This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Changes v2->v3:
 Changed the debug instructions. Using the all zero opcode in the instruction 
word
  as illegal instruction as mentioned in Power ISA instead of ABS
 Removed reg updated in emulation assist and added a call to
  kvmppc_emulate_instruction for reg update.

Changes v1->v2:

 Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also 
share it.
 Added code to use KVM get one reg infrastructure to get debug opcode.
 Updated emulate.c to include emulation of debug instruction incase of PR_KVM.
 Made changes to commit message.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/kvm_book3s.h |  7 +++
 arch/powerpc/include/asm/ppc-opcode.h |  5 +
 arch/powerpc/kvm/book3s.c |  3 ++-
 arch/powerpc/kvm/book3s_hv.c  | 12 ++--
 arch/powerpc/kvm/book3s_pr.c  |  3 +++
 arch/powerpc/kvm/emulate.c|  9 +
 6 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f52f656..f17e3fd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -24,6 +24,13 @@
 #include 
 #include 
 
+/*
+ * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting Software 
Breakpoint.
+ * Based on PowerISA v2.07, Instruction with opcode 0s will be treated as 
illegal
+ * instruction.
+ */
+#define KVMPPC_INST_BOOK3S_DEBUG   0x0000
+
 struct kvmppc_bat {
u64 raw;
u32 bepi;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3132bb9..56739b3 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -111,6 +111,11 @@
 #define OP_31_XOP_LHBRX 790
 #define OP_31_XOP_STHBRX918
 
+/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction
+ * 0x0000 -- Primary opcode is 0s
+ */
+#define OP_ZERO 0x0
+
 #define OP_LWZ  32
 #define OP_LD   58
 #define OP_LWZU 33
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c254c27..b40fe5d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
 {
-   return -EINVAL;
+   vcpu->guest_debug = dbg->control;
+   return 0;
 }
 
 void kvmppc_decrementer_func(unsigned long data)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7a12edb..7c16f4f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,8 +725,13 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 * we don't emulate any guest instructions at this stage.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
-   r = RESUME_GUEST;
+   if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG) {
+   kvmppc_emulate_instruction(run, vcpu);
+   r = RESUME_HOST;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   r = RESUME_GUEST;
+   }
break;
/*
 * This occurs if the guest (kernel or userspace), does something that
@@ -831,6 +836,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 
id,
long int i;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_BOOK3S_DEBUG);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, 0);
break;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 8eef1e5..27f5234 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1229,6 +1229,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
int r = 0;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_BOOK3S_DEBUG);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, to_book3s(vcpu)->hior);
break;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index da86d9b..d95014e 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -458,6 +458,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, ra, 

Re: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Paolo Bonzini
Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
> > TPR_THRESHOLD will be likely written as zero, but the processor will
> > never use it anyway.  It's just a small optimization because
> > nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.
> 
> Theoretically, you are right. But we should not expect all VMMs
> follow it. It is not worth to violate the SDM just for saving two or
> three instructions' cost.

Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
vmcs_write32.  But still, checking nested_cpu_has is not strictly
necessary.  Right now they both are a single AND, but I have plans to
change all of the cpu_has_*() checks to static keys.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-08-01:
> Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
>>> TPR_THRESHOLD will be likely written as zero, but the processor
>>> will never use it anyway.  It's just a small optimization because
>>> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always
> be true.
>> 
>> Theoretically, you are right. But we should not expect all VMMs
>> follow it. It is not worth to violate the SDM just for saving two or
>> three instructions' cost.
> 
> Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
> vmcs_write32.  But still, checking nested_cpu_has is not strictly necessary.
> Right now they both are a single AND, but I have plans to change all
> of the
> cpu_has_*() checks to static keys.

See v2 patch. It isn't a problem anymore.

> 
> Paolo


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrity in untrusted environments

2014-07-31 Thread Paolo Bonzini
Il 31/07/2014 23:25, Shiva V ha scritto:
> Hello,
>  I am exploring ideas to implement a service inside a virtual machine on 
> untrusted hypervisors under current cloud infrastructures.
>  Particularly, I am interested how one can verify the integrity of the 
> service in an environment where hypervisor is not trusted. This is my setup.
> 
> 1. I have two virtual machines. (Normal client VM's).
> 2. VM-A is executing a service and VM-B wants to verify its integrity.
> 3. Both are executing on untrusted hypervisor.
> 
> Though, Intel SGX will solve this, by using the concept of enclaves, its not 
> publicly available yet.
> 
> One could also use SMM to verify the integrity. But since this is time based 
> approach, one could easily exploit between the time window.
> 
> I was drilling down this idea, We know Write xor Execute Memory Protection 
> Scheme. Using this idea,If we could lock down the VM-A memory pages where 
> the service is running and also corresponding page-table entries, then have 
> a handler code that temporarily unlocks them for legitimate updates, then 
> one could verify the integrity of the service running. 

You can make a malicious hypervisor that makes all executable pages also
writable, but hides the fact to the running process.  But really, if you
control the hypervisor you can just write to guest memory as you wish.

SMM will be emulated by the hypervisor.

If the hypervisor is untrusted, you cannot solve _everything_.  For the
third time, what attacks are you trying to protect from?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html