from:"Andrew Cooper"

Re: [Xen-devel] [PATCH 13/16] SUPPORT.md: Add secondary memory management features

2017-11-23 Thread Andrew Cooper

On 23/11/17 12:00, George Dunlap wrote:
> On 11/23/2017 11:55 AM, Olaf Hering wrote:
>> On Thu, Nov 23, Olaf Hering wrote:
>>
>>> On Thu, Nov 23, Jan Beulich wrote:
 Olaf, are you still playing with it every now and then?
>>> No, I have not tried it since I last touched it.
>> I just tried it, and it failed:
>>
>> root@stein-schneider:~ # /usr/lib/xen/bin/xenpaging -d 7 -f /dev/shm/p -v
>> xc: detail: xenpaging init
>> xc: detail: watching '/local/domain/7/memory/target-tot_pages'
>> xc: detail: Failed allocation for dom 7: 1 extents of order 0
>> xc: error: Failed to populate ring gfn
>>  (16 = Device or resource busy): Internal error
> That looks like just a memory allocation.  Do you use autoballooning
> dom0?  Maybe try ballooning dom0 down first?

Its not that.  This failure comes from the ring living inside the p2m,
and has already been found with introspection.

When a domain has ballooned exactly to its allocation, it is not
possible to attach a vmevent/sharing/paging ring, because attaching the
ring requires an add_to_physmap.  In principle, the toolstack could bump
the allocation by one frame, but that's racy with the guest trying to
claim the frame itself.

Pauls work to allow access to pages not in the p2m is the precursor to
fixing this problem, after which the rings move out of the guest
(reduction in attack surface), and there is nothing the guest can do to
inhibit toolstack/privileged operations like this.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/16] SUPPORT.md: Add secondary memory management features

2017-11-23 Thread Andrew Cooper

On 23/11/17 12:45, Olaf Hering wrote:
> On Thu, Nov 23, Andrew Cooper wrote:
>
>> Its not that.  This failure comes from the ring living inside the p2m,
>> and has already been found with introspection.
> In my case it was just a wrong domid. Now I use 'xl domid domU' and
> xenpaging does something. It seems paging out and in works still to some
> degree.  But it still/again needs lots of testing and fixing.
>
> I get errors like this, and xl dmesg has also errors:
>
> ...
> xc: detail: populate_page < gfn 10100 pageslot 127
> xc: detail: Need to resume 200 pages to reach 131328 target_tot_pages
> xc: detail: Got event from evtchn
> xc: detail: populate_page < gfn 10101 pageslot 128
> xenforeignmemory: error: mmap failedxc: : Invalid argument
> detail: populate_page < gfn 10102 pageslot 129
> xc: detail: populate_page < gfn 10103 pageslot 130
> xc: detail: populate_page < gfn 10104 pageslot 131
> ...
>
> ...
> (XEN) vm_event.c:289:d0v0 d2v0 was not paused.
> (XEN) vm_event.c:289:d0v0 d2v0 was not paused.
> (XEN) vm_event.c:289:d0v2 d2v2 was not paused.
> ...

Hmm ok.  Either way, I think this demonstrates that the feature is not
of "Tech Preview" quality.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: fix hvmemul_rep_outs_set_context()

2017-11-23 Thread Andrew Cooper

On 23/11/17 15:09, Jan Beulich wrote:
> There were two issues with this function: Its use of
> hvmemul_do_pio_buffer() was wrong (the function deals only with
> individual port accesses, not repeated ones, i.e. passing it
> "*reps * bytes_per_rep" does not have the intended effect). And it
> could have processed a larger set of operations in one go than was
> probably intended (limited just by the size that xmalloc() can hand
> back).
>
> By converting to proper use of hvmemul_do_pio_buffer(), no intermediate
> buffer is needed at all. As a result a preemption check is being added.
>
> Also drop unused parameters from the function.
>
> Signed-off-by: Jan Beulich 

While this does look like real bug, and bugfix, it isn't the issue I'm
hitting.  I've distilled the repro scenario down to a tiny XTF test,
which is just a `rep outsb` with a buffer which crosses a page boundary.

The results are reliably:

(d1) --- Xen Test Framework ---
(d1) Environment: HVM 32bit (No paging)
(d1) Test hvm-print
(d1) String crossing a page boundary
(XEN) MMIO emulation failed (1): d1v0 32bit @ 0010:001032b0 -> 5e c3 8d
b4 26 00 00 00 00 8d bc 27 00 00 00 00
(d1) Test result: SUCCESS

The Port IO hits a retry because of hitting the page boundary, and the
retry logic successes, as evident by all data hitting hvm_print_line(). 
Somewhere however, the PIO turns into MMIO, and a failure is reported
after the PIO completed successfully.  %rip in the failure message
points after the `rep outsb`, rather than at it.

If anyone has any ideas, I'm all ears.  If not, I will try to find some
time to look deeper into the issue.

~Andrew
>From 9141a36374f52434a291e3be41bd259cfb9bda72 Mon Sep 17 00:00:00 2001
From: Andrew Cooper 
Date: Thu, 23 Nov 2017 18:31:40 +
Subject: [PATCH] MMIO failure trigger

---
 tests/hvm-print/Makefile |  9 +
 tests/hvm-print/main.c   | 40 
 2 files changed, 49 insertions(+)
 create mode 100644 tests/hvm-print/Makefile
 create mode 100644 tests/hvm-print/main.c

diff --git a/tests/hvm-print/Makefile b/tests/hvm-print/Makefile
new file mode 100644
index 000..c70bede
--- /dev/null
+++ b/tests/hvm-print/Makefile
@@ -0,0 +1,9 @@
+include $(ROOT)/build/common.mk
+
+NAME  := hvm-print
+CATEGORY  := utility
+TEST-ENVS := hvm32
+
+obj-perenv += main.o
+
+include $(ROOT)/build/gen.mk
diff --git a/tests/hvm-print/main.c b/tests/hvm-print/main.c
new file mode 100644
index 000..882b716
--- /dev/null
+++ b/tests/hvm-print/main.c
@@ -0,0 +1,40 @@
+/**
+ * @file tests/hvm-print/main.c
+ * @ref test-hvm-print
+ *
+ * @page test-hvm-print hvm-print
+ *
+ * @todo Docs for test-hvm-print
+ *
+ * @see tests/hvm-print/main.c
+ */
+#include 
+
+const char test_title[] = "Test hvm-print";
+
+static char buf[2 * PAGE_SIZE] __page_aligned_bss;
+
+void test_main(void)
+{
+char *ptr = &buf[4090];
+size_t len;
+
+strcpy(ptr, "String crossing a page boundary\n");
+len = strlen(ptr);
+
+asm volatile("rep; outsb"
+ : "+S" (ptr), "+c" (len)
+ : "d" (0xe9));
+
+xtf_success(NULL);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next 07/16] xen/arm: Introduce copy_to_guest_phys_flush_dcache

2017-11-23 Thread Andrew Cooper

On 23/11/17 18:32, Julien Grall wrote:
> This new function will be used in a follow-up patch to copy data to the guest
> using the IPA (aka guest physical address) and then clean the cache.
>
> Signed-off-by: Julien Grall 
> ---
>  xen/arch/arm/guestcopy.c   | 10 ++
>  xen/include/asm-arm/guest_access.h |  6 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
> index be53bee559..7958663970 100644
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -110,6 +110,16 @@ unsigned long raw_copy_from_guest(void *to, const void 
> __user *from, unsigned le
>COPY_from_guest | COPY_linear);
>  }
>  
> +unsigned long copy_to_guest_phys_flush_dcache(struct domain *d,
> +  paddr_t gpa,
> +  void *buf,
> +  unsigned int len)
> +{
> +/* P2M is shared between all vCPUs, so the vCPU used does not matter. */

Be very careful with this line of thinking.  It is only works after
DOMCTL_max_vcpus has succeeded, and before that point, it is a latent
NULL pointer dereference.

Also, what about vcpus configured with alternative views?

~Andrew

> +return copy_guest(buf, gpa, len, d->vcpu[0],
> +  COPY_to_guest | COPY_ipa | COPY_flush_dcache);
> +}
> +
>  int access_guest_memory_by_ipa(struct domain *d, paddr_t gpa, void *buf,
> uint32_t size, bool is_write)
>  {
> diff --git a/xen/include/asm-arm/guest_access.h 
> b/xen/include/asm-arm/guest_access.h
> index 6796801cfe..224d2a033b 100644
> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -11,6 +11,12 @@ unsigned long raw_copy_to_guest_flush_dcache(void *to, 
> const void *from,
>  unsigned long raw_copy_from_guest(void *to, const void *from, unsigned len);
>  unsigned long raw_clear_guest(void *to, unsigned len);
>  
> +/* Copy data to guest physical address, then clean the region. */
> +unsigned long copy_to_guest_phys_flush_dcache(struct domain *d,
> +  paddr_t phys,
> +  void *buf,
> +  unsigned int len);
> +
>  int access_guest_memory_by_ipa(struct domain *d, paddr_t ipa, void *buf,
> uint32_t size, bool is_write);
>  


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.10] x86: Avoid corruption on migrate for vcpus using CPUID Faulting

2017-11-25 Thread Andrew Cooper

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

To move this MSR, use the new guest_{rd,wr}msr() infrastructure.  This avoids
duplicating or opencoding the feature check and value logic, as well as
abstracting away the internal value representation.  One small adjustment to
guest_wrmsr() is required to cope with being called in toolstack context.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Julien Grall 

This needs backporting to 4.8 and later, and therefore should be considered
for 4.10 at this point.
---
 xen/arch/x86/domctl.c | 46 +-
 xen/arch/x86/hvm/hvm.c| 40 +++-
 xen/arch/x86/msr.c|  3 ++-
 xen/include/asm-x86/msr.h |  3 +++
 4 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 80b4df9..e98c4a3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1286,7 +1286,7 @@ long arch_do_domctl(
 struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
 struct xen_domctl_vcpu_msr msr;
 struct vcpu *v;
-uint32_t nr_msrs = 0;
+uint32_t nr_msrs = 1;
 
 ret = -ESRCH;
 if ( (vmsrs->vcpu >= d->max_vcpus) ||
@@ -1311,10 +1311,49 @@ long arch_do_domctl(
 vmsrs->msr_count = nr_msrs;
 else
 {
+static const uint32_t msrs[] = {
+MSR_INTEL_MISC_FEATURES_ENABLES,
+};
+unsigned int j;
+
 i = 0;
 
 vcpu_pause(v);
 
+for ( j = 0; j < ARRAY_SIZE(msrs); ++j )
+{
+uint64_t val;
+int rc = guest_rdmsr(v, msrs[j], &val);
+
+/*
+ * It is the programmers responsibility to ensure that
+ * msrs[] contain generally-readable MSRs.
+ * X86EMUL_EXCEPTION here implies a missing feature.
+ */
+if ( rc == X86EMUL_EXCEPTION )
+continue;
+
+if ( rc != X86EMUL_OKAY )
+{
+ASSERT_UNREACHABLE();
+ret = -ENXIO;
+break;
+}
+
+if ( !val )
+continue; /* Skip empty MSRs. */
+
+if ( i < vmsrs->msr_count && !ret )
+{
+msr.index = msrs[j];
+msr.reserved = 0;
+msr.value = val;
+if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
+ret = -EFAULT;
+}
+++i;
+}
+
 if ( boot_cpu_has(X86_FEATURE_DBEXT) )
 {
 unsigned int j;
@@ -1375,6 +1414,11 @@ long arch_do_domctl(
 
 switch ( msr.index )
 {
+case MSR_INTEL_MISC_FEATURES_ENABLES:
+if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
+break;
+continue;
+
 case MSR_AMD64_DR0_ADDRESS_MASK:
 if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
  (msr.value >> 32) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c765a5e..7f18f3b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1322,11 +1322,14 @@ static int hvm_load_cpu_xsave_states(struct domain *d, 
hvm_domain_context_t *h)
 }
 
 #define HVM_CPU_MSR_SIZE(cnt) offsetof(struct hvm_msr, msr[cnt])
-static unsigned int __read_mostly msr_count_max;
+static unsigned int __read_mostly msr_count_max = 1;
 
 static int hvm_save_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
 {
 struct vcpu *v;
+static const uint32_t msrs[] = {
+MSR_INTEL_MISC_FEATURES_ENABLES,
+};
 
 for_each_vcpu ( d, v )
 {
@@ -1340,6 +1343,32 @@ static int hvm_save_cpu_msrs(struct domain *d, 
hvm_domain_context_t *h)
 ctxt = (struct hvm_msr *)&h->data[h->cur];
 ctxt->count = 0;
 
+for ( i = 0; i < ARRAY_SIZE(msrs); ++i )
+{
+uint64_t val;
+int rc = guest_rdmsr(v, msrs[i], &val);
+
+/*
+ * It is the programmers responsibility to ensure that msrs[]
+ * contain generally-readable MSRs.  X86EMUL_EXCEPTION here
+ * implies a missing feature.
+ */
+if ( rc == X86EMUL_EXCEPTION )
+continue;
+
+if ( rc != X86EMUL_OKAY )
+

Re: [Xen-devel] [PATCH for-4.10] x86: Avoid corruption on migrate for vcpus using CPUID Faulting

2017-11-27 Thread Andrew Cooper

On 27/11/17 09:53, Jan Beulich wrote:
 On 25.11.17 at 19:15,  wrote:
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -1286,7 +1286,7 @@ long arch_do_domctl(
>>  struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
>>  struct xen_domctl_vcpu_msr msr;
>>  struct vcpu *v;
>> -uint32_t nr_msrs = 0;
>> +uint32_t nr_msrs = 1;
> I think this ought to be ARRAY_SIZE(), to avoid a
> possible update of one but not the other. The price to pay (moving
> the array outwards) is reasonable imo.

Ok.

>  However, ...
>
>> @@ -1311,10 +1311,49 @@ long arch_do_domctl(
>>  vmsrs->msr_count = nr_msrs;
>>  else
>>  {
>> +static const uint32_t msrs[] = {
>> +MSR_INTEL_MISC_FEATURES_ENABLES,
> ... is this really a non-optional MSR? In particular,
> calculate_hvm_max_policy() ties it to INTEL_PLATFORM_INFO,
> which in turn is being tied to running on an Intel CPU.
> calculate_pv_max_policy() is even more picky. I think you want
> to introduce a function in msr.c complementing guest_rdmsr() /
> guest_wrmsr(), similar to HVM's .init_msr() hook.

Whether an MSR should be migrated depends on whether the guest is able
to use it.

I'm specifically looking to remove the concept of optional MSRs to avoid
duplicating the MSR policy logic, and risking it being different.  In
reality, what this means is that the migration code will be expected to
cope with the union of all possible MSRs, and only actually get a subset
to put into the stream.

Also, this MSR specifically is about to become common, but that is
post-4.10 at this point.

>
>> +};
>> +unsigned int j;
>> +
>>  i = 0;
>>  
>>  vcpu_pause(v);
>>  
>> +for ( j = 0; j < ARRAY_SIZE(msrs); ++j )
>> +{
>> +uint64_t val;
>> +int rc = guest_rdmsr(v, msrs[j], &val);
>> +
>> +/*
>> + * It is the programmers responsibility to ensure that
>> + * msrs[] contain generally-readable MSRs.
>> + * X86EMUL_EXCEPTION here implies a missing feature.
>> + */
>> +if ( rc == X86EMUL_EXCEPTION )
>> +continue;
>> +
>> +if ( rc != X86EMUL_OKAY )
>> +{
>> +ASSERT_UNREACHABLE();
>> +ret = -ENXIO;
>> +break;
>> +}
>> +
>> +if ( !val )
>> +continue; /* Skip empty MSRs. */
>> +
>> +if ( i < vmsrs->msr_count && !ret )
>> +{
>> +msr.index = msrs[j];
>> +msr.reserved = 0;
>> +msr.value = val;
>> +if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
>> +ret = -EFAULT;
>> +}
>> +++i;
>> +}
>> +
>>  if ( boot_cpu_has(X86_FEATURE_DBEXT) )
>>  {
>>  unsigned int j;
> With the j you introduce in the next outer scope, I think this one
> should  be removed.

Oh, indeed.  I hadn't even noticed this one here.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] x86: replace bad ASSERT() in xenmem_add_to_physmap_one()

2017-11-27 Thread Andrew Cooper

On 27/11/17 09:11, Jan Beulich wrote:
> There are no locks being held, i.e. it is possible to be triggered by
> racy hypercall invocations. Subsequent code doesn't really depend on the
> checked values, so this is not a security issue.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

> ---
> I'm up for to better suggestions for the EXDEV I've used; I certainly
> don't want to use -EINVAL, and -EAGAIN (afaik) generally is meant as a
> hint to re-invoke with the same arguments (which wouldn't help here).

I can't offer a better suggestion.

>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4143,8 +4143,12 @@ int xenmem_add_to_physmap_one(
>  /* Unmap from old location, if any. */
>  old_gpfn = get_gpfn_from_mfn(mfn_x(mfn));
>  ASSERT( old_gpfn != SHARED_M2P_ENTRY );
> -if ( space == XENMAPSPACE_gmfn || space == XENMAPSPACE_gmfn_range )
> -ASSERT( old_gpfn == gfn );
> +if ( (space == XENMAPSPACE_gmfn || space == XENMAPSPACE_gmfn_range) &&
> + old_gpfn != gfn )
> +{
> +rc = -EXDEV;
> +goto put_both;
> +}
>  if ( old_gpfn != INVALID_M2P_ENTRY )
>  rc = guest_physmap_remove_page(d, _gfn(old_gpfn), mfn, 
> PAGE_ORDER_4K);
>  
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: check paging mode earlier in xenmem_add_to_physmap_one()

2017-11-27 Thread Andrew Cooper

On 27/11/17 09:12, Jan Beulich wrote:
> There's no point in deferring this until after some initial processing,
> and it's actively wrong for the XENMAPSPACE_gmfn_foreign handling to not
> have such a check at all.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] improve XENMEM_add_to_physmap_batch address checking

2017-11-27 Thread Andrew Cooper

On 27/11/17 09:12, Jan Beulich wrote:
> As a follow-up to XSA-212 we should have addressed a similar issue here:
> The handles being advanced at the top of xenmem_add_to_physmap_batch()
> means we allow hypervisor space accesses (in particular, for "errs",
> writes) with suitably crafted input arguments. This isn't a security
> issue in this case because of the limited width of struct
> xen_add_to_physmap_batch's size field: It being 16-bits wide, only the
> r/o M2P area can be accessed. Still we can and should do better.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: fix interaction between internal and extern emulation

2017-11-27 Thread Andrew Cooper

On 27/11/17 08:28, Jan Beulich wrote:
> handle_hvm_io_completion() is being involved in resuming from requests
> sent to a device model only, while re-invocation of internally handled
> I/O which couldn't be handled in one go simply re-starts the affected
> instruction. When an internally handled split request is being followed
> by one sent to a device model, so far nothing reset vio->io_completion,
> leading to an MMIO emulation attempt on the next instruction _after_ the
> one succesfully sent to qemu if that one doesn't itself require
> completion handling.
>
> Since only repeated string instructions are affected, strictly speaking
> the adjustment to handle_pio() isn't needed. Do it nevertheless for
> consistency as well as to avoid the lack thereof becoming an issue in
> the future; put the main change in generic enough a place to also cover
> VMX real mode emulation.
>
> Reported-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 for-4.10] x86: Avoid corruption on migrate for vcpus using CPUID Faulting

2017-11-27 Thread Andrew Cooper

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

To move this MSR, use the new guest_{rd,wr}msr() infrastructure.  This avoids
duplicating or opencoding the feature check and value logic, as well as
abstracting away the internal value representation.  One small adjustment to
guest_wrmsr() is required to cope with being called in toolstack context.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Julien Grall 

v2:
 * Move the msrs[] array to an outer scope and derive the number to send with
   ARRAY_SIZE()
 * Don't create a shadowed j variable in arch_do_domctl()

This needs backporting to 4.8 and later, and therefore should be considered
for 4.10 at this point.
---
 xen/arch/x86/domctl.c | 49 ---
 xen/arch/x86/hvm/hvm.c| 41 ++-
 xen/arch/x86/msr.c|  3 ++-
 xen/include/asm-x86/msr.h |  3 +++
 4 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 80b4df9..1ddd3d0 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1286,7 +1286,10 @@ long arch_do_domctl(
 struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
 struct xen_domctl_vcpu_msr msr;
 struct vcpu *v;
-uint32_t nr_msrs = 0;
+static const uint32_t msrs_to_send[] = {
+MSR_INTEL_MISC_FEATURES_ENABLES,
+};
+uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
 
 ret = -ESRCH;
 if ( (vmsrs->vcpu >= d->max_vcpus) ||
@@ -1311,14 +1314,49 @@ long arch_do_domctl(
 vmsrs->msr_count = nr_msrs;
 else
 {
+unsigned int j;
+
 i = 0;
 
 vcpu_pause(v);
 
-if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
 {
-unsigned int j;
+uint64_t val;
+int rc = guest_rdmsr(v, msrs_to_send[j], &val);
+
+/*
+ * It is the programmers responsibility to ensure that
+ * msrs[] contain generally-read/write MSRs.
+ * X86EMUL_EXCEPTION here implies a missing feature, and
+ * that the guest doesn't have access to the MSR.
+ */
+if ( rc == X86EMUL_EXCEPTION )
+continue;
+
+if ( rc != X86EMUL_OKAY )
+{
+ASSERT_UNREACHABLE();
+ret = -ENXIO;
+break;
+}
+
+if ( !val )
+continue; /* Skip empty MSRs. */
 
+if ( i < vmsrs->msr_count && !ret )
+{
+msr.index = msrs_to_send[j];
+msr.reserved = 0;
+msr.value = val;
+if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
+ret = -EFAULT;
+}
+++i;
+}
+
+if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+{
 if ( v->arch.pv_vcpu.dr_mask[0] )
 {
 if ( i < vmsrs->msr_count && !ret )
@@ -1375,6 +1413,11 @@ long arch_do_domctl(
 
 switch ( msr.index )
 {
+case MSR_INTEL_MISC_FEATURES_ENABLES:
+if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
+break;
+continue;
+
 case MSR_AMD64_DR0_ADDRESS_MASK:
 if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
  (msr.value >> 32) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c765a5e..ec3dc48 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1322,7 +1322,10 @@ static int hvm_load_cpu_xsave_states(struct domain *d, 
hvm_domain_context_t *h)
 }
 
 #define HVM_CPU_MSR_SIZE(cnt) offsetof(struct hvm_msr, msr[cnt])
-static unsigned int __read_mostly msr_count_max;
+static const uint32_t msrs_to_send[] = {
+MSR_INTEL_MISC_FEATURES_ENABLES,
+};
+static unsigned int __read_mostly msr_count_max = ARRAY_SIZE(msrs_to_send);
 
 static int hvm_save_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
 {
@@ -1340,6 +1343,33 @@ static int hvm_save_cpu_msrs(struct domain *d, 
hvm_domain_context_t *h)
 ctxt = (struct hvm_msr *)&h->data[h->cur];
 ctxt->count = 0;
 
+for ( i = 0; i < ARRAY_SIZE(msrs_to_send

Re: [Xen-devel] [PATCH] x86/setup: do not relocate below the end of current Xen image placement

2017-11-27 Thread Andrew Cooper

On 27/11/17 15:41, Daniel Kiper wrote:
> If it is possible we would like to have the Xen image higher than the
> booloader put it and certainly do not overwrite the Xen code and data
> during copy/relocation. Otherwise the Xen may crash silently at boot.
>
> Signed-off-by: Daniel Kiper 

Actually, there is a second related bug which I've only just got to the
bottom of, and haven't had time to fix yet.

(XEN) Bootloader: GRUB 2.02
(XEN) Command line: hvm_fep com1=115200,8n1 console=com1,vga
dom0_mem=2048M,max:2048M watchdog ucode=scan dom0_max_vcpus=8
crashkernel=192M,below=4G
(XEN) Xen image load base address: 0xaba0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009bc00 (usable)
(XEN)  0009bc00 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - ac209000 (usable)
(XEN)  ac209000 - aeb99000 (reserved)
(XEN)  aeb99000 - aeb9d000 (ACPI NVS)
(XEN)  aeb9d000 - aecd1000 (reserved)
(XEN)  aecd1000 - aecd4000 (ACPI NVS)
(XEN)  aecd4000 - aecf5000 (reserved)
(XEN)  aecf5000 - aecf6000 (ACPI NVS)
(XEN)  aecf6000 - aed24000 (reserved)
(XEN)  aed24000 - aef2f000 (ACPI NVS)
(XEN)  aef2f000 - aefed000 (ACPI data)
(XEN)  aefed000 - af00 (usable)
(XEN)  af00 - b000 (reserved)
(XEN)  f800 - fc00 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed19000 - fed1a000 (reserved)
(XEN)  fed1c000 - fed2 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  ff40 - 0001 (reserved)
(XEN)  0001 - 00085000 (usable)
(XEN) Kdump: DISABLED (failed to reserve 192MB (196608kB) at 0xa020)
(XEN) ACPI: RSDP 000F0410, 0024 (r2 INTEL )

When booting with Grub2 capable of locating Xen at its preferred
location, the calculation for the kexec region fails, because the
current location of Xen isn't taken into account.

The end of the chosen kexec region (0xa020 + 0x0c00) overlaps
with the bottom of the Xen image.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/9] x86/vvmx: Extract operand reading logic into operand_read()

2017-11-27 Thread Andrew Cooper

On 27/11/17 17:01, Jan Beulich wrote:
 On 26.10.17 at 19:03,  wrote:
>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>> @@ -361,6 +361,40 @@ static void reg_write(struct cpu_user_regs *regs,
>>  *pval = value;
>>  }
>>  
>> +static int operand_read(void *buf, struct vmx_inst_op *op,
>> +struct cpu_user_regs *regs, unsigned int bytes)
>> +{
>> +if ( op->type == VMX_INST_MEMREG_TYPE_REG )
>> +{
>> +switch ( bytes )
>> +{
>> +case 4:
>> +*(uint32_t *)buf = reg_read(regs, op->reg_idx);
>> +
>> +case 8:
>> +*(uint64_t *)buf = reg_read(regs, op->reg_idx);
>> +
>> +default:
>> +ASSERT_UNREACHABLE();
>> +return X86EMUL_UNHANDLEABLE;
>> +}
> Looks like there are two missing break statements up there.

I fixed these up x86-next.  (Although it was only because Coverity
noticed and complained at me...)

>
>> +return X86EMUL_OKAY;
> This and ...
>
>> +}
>> +else
>> +{
>> +pagefault_info_t pfinfo;
>> +int rc = hvm_copy_from_guest_linear(buf, op->mem, bytes, 0, 
>> &pfinfo);
>> +
>> +if ( rc == HVMTRANS_bad_linear_to_gfn )
>> +hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>> +if ( rc != HVMTRANS_okay )
>> +return X86EMUL_EXCEPTION;
>> +
>> +return X86EMUL_OKAY;
> ... this should become a uniform ...
>
>> +}
> ... return here.

I tried this, but later patches in the series need them to move back. 
Overall, this layout reduces the code churn across the series.

>
>> @@ -440,7 +474,12 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
>>  decode->op[0].type = VMX_INST_MEMREG_TYPE_REG;
>>  decode->op[0].reg_idx = info.fields.reg1;
>>  if ( poperandS != NULL )
>> -*poperandS = reg_read(regs, decode->op[0].reg_idx);
>> +{
>> +int rc = operand_read(poperandS, &decode->op[0], regs,
>> +  decode->op[0].len);
>> +if ( rc != X86EMUL_OKAY )
> Blank line between declaration(s) and statement(s) please.

I can fix this.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] XSM: add Kconfig option to override bootloader provided policy

2017-11-28 Thread Andrew Cooper

On 28/11/17 18:06, Tamas K Lengyel wrote:
> From: Tamas K Lengyel 
>
> Currently the built-in XSM policy only gets used if there is no other policy
> specified during boot. In this patch we add a Kconfig option to specify to 
> only
> use built-in policy during boot. This is particularly important when booting
> Xen through the shim to ensure the XSM policy gets measured and that it can't
> be replaced by another unmeasured policy by the bootloader. Note that the XSM
> policy can still be updated after boot (from dom0 for example) if the built-in
> policy allows it.
>
> Signed-off-by: Tamas K Lengyel 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Daniel De Graaf 
> Cc: ope...@googlegroups.com
> ---
>  xen/common/Kconfig | 14 ++
>  xen/xsm/xsm_core.c |  2 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 103ef44cb5..5ad0d03f37 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -140,6 +140,20 @@ config XSM_POLICY
>  
> If unsure, say Y.
>  
> +config XSM_POLICY_OVERRIDE
> + bool "Built-in security policy overrides bootloader provided policy"

The overall change certainly looks good and it is obvious why it is a
benefit.  However, text/functionality like this is cognitively hard to
follow, and _OVERRIDE isn't obviously as to its functionality at a glance.

Wouldn't it be better to have XSM_BOOTLOADER_POLICY (or possibly
XSM_ALLOW_?), which defaults to y, and can be forced off for extra security?

~Andrew

> + default n
> + depends on XSM && XSM_POLICY
> + ---help---
> +   Set this option to 'Y' to have the hypervisor ignore the security
> +   policy provided by the bootloader, and use ONLY the built-in
> +   security policy.
> +
> +   This can be used to ensure only verified security policies are
> +   loaded during boot time.
> +
> +   If unsure, say N.
> +
>  config LATE_HWDOM
>   bool "Dedicated hardware domain"
>   default n
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point

2017-11-28 Thread Andrew Cooper

On 28/11/17 19:34, Maran Wilson wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, Qemu should be able to boot directly into the
> uncompressed Linux kernel binary without the need to run firmware.
>
> There already exists an ABI to allow this for Xen PVH guests and the ABI is
> supported by Linux and FreeBSD:
>
>https://xenbits.xen.org/docs/unstable/misc/hvmlite.html

Just FYI, this link has recently become stale, following some cleanup. 
The document is now:

https://xenbits.xen.org/docs/unstable/misc/pvh.html

~Andrew

>
> This PoC patch enables Qemu to use that same entry point for booting KVM
> guests.
>
> Even though the code is still PoC quality, I'm sending this as an RFC now
> since there are a number of different ways the specific implementation
> details can be handled. I chose a shared code path for Xen and KVM guests
> but could just as easily create a separate code path that is advertised by
> a different ELF note for KVM. There also seems to be some flexibility in
> how the e820 table data is passed and how (or if) it should be identified
> as e820 data. As a starting point, I've chosen the options that seem to
> result in the smallest patch with minimal to no changes required of the
> x86/HVM direct boot ABI.
> ---
>  arch/x86/xen/enlighten_pvh.c | 74 
> 
>  1 file changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
> index 98ab176..d93f711 100644
> --- a/arch/x86/xen/enlighten_pvh.c
> +++ b/arch/x86/xen/enlighten_pvh.c
> @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void)
>   acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM;
>  }
>  
> -static void __init init_pvh_bootparams(void)
> +static void __init init_pvh_bootparams(bool xen_guest)
>  {
>   struct xen_memory_map memmap;
>   int rc;
>  
>   memset(&pvh_bootparams, 0, sizeof(pvh_bootparams));
>  
> - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table);
> - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table);
> - rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> - if (rc) {
> - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
> - BUG();
> + if (xen_guest) {
> + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table);
> + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table);
> + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> + if (rc) {
> + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
> + BUG();
> + }
> + pvh_bootparams.e820_entries = memmap.nr_entries;
> + } else if (pvh_start_info.nr_modules > 1) {
> + /* The second module should be the e820 data for KVM guests */
> + struct hvm_modlist_entry *modaddr;
> + char e820_sig[] = "e820 data";
> + struct boot_e820_entry *ep;
> + struct e820_table *tp;
> + char *cmdline_str;
> + int idx;
> +
> + modaddr = __va(pvh_start_info.modlist_paddr +
> +sizeof(struct hvm_modlist_entry));
> + cmdline_str = __va(modaddr->cmdline_paddr);
> +
> + if ((modaddr->cmdline_paddr) &&
> + (!strncmp(e820_sig, cmdline_str, sizeof(e820_sig {
> + tp = __va(modaddr->paddr);
> + ep = (struct boot_e820_entry *)tp->entries;
> +
> + pvh_bootparams.e820_entries = tp->nr_entries;
> +
> + for (idx = 0; idx < tp->nr_entries ; idx++, ep++)
> + pvh_bootparams.e820_table[idx] = *ep;
> + }
>   }
> - pvh_bootparams.e820_entries = memmap.nr_entries;
>  
>   if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) {
>   pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr =
> @@ -55,8 +80,9 @@ static void __init init_pvh_bootparams(void)
>   pvh_bootparams.e820_table[pvh_bootparams.e820_entries].type =
>   E820_TYPE_RESERVED;
>   pvh_bootparams.e820_entries++;
> - } else
> + } else if (xen_guest) {
>   xen_raw_printk("Warning: Can fit ISA range into e820\n");
> + }
>  
>   pvh_bootparams.hdr.cmd_line_ptr =
>   pvh_start_info.cmdline_paddr;
> @@ -76,7 +102,7 @@ static void __init init_pvh_bootparams(void)
>* environment (i.e. hardware_subarch 0).
>*/
>   pvh_bootparams.hdr.version = 0x212;
> - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */
> + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0;
>  }
>  
>  /*
> @@ -85,22 +111,32 @@ static void __init init_pvh_bootparams(void)
>   */
>  voi

[Xen-devel] [PATCH for-next] x86/traps: Drop redundant printk() in fatal_trap()

2017-11-29 Thread Andrew Cooper

show_page_walk() already prints the linear address of the walk, and
show_execution_state() has printed a raw %cr2 value.  This avoids having
two adjacent log lines with identical information.

  (XEN) Faulting linear address: 025ff028
  (XEN) Pagetable walk from 025ff028:
  ...

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/traps.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 86506f3..c9a849c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -650,11 +650,7 @@ void fatal_trap(const struct cpu_user_regs *regs, bool 
show_remote)
 show_execution_state(regs);
 
 if ( trapnr == TRAP_page_fault )
-{
-unsigned long cr2 = read_cr2();
-printk("Faulting linear address: %p\n", _p(cr2));
-show_page_walk(cr2);
-}
+show_page_walk(read_cr2());
 
 if ( show_remote )
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-next] x86/setup: Move reading of cached state earlier

2017-11-29 Thread Andrew Cooper

These are reads of registers which have already been set up, so are safe to do
at any point.  However, TLB flushes (e.g. from bootstrap_map()) don't function
until get_cpu_info()->cr4 is populated.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/setup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 32bb02e..2b4b48c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -676,6 +676,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 smp_prepare_boot_cpu();
 sort_exception_tables();
 
+rdmsrl(MSR_EFER, this_cpu(efer));
+asm volatile ( "mov %%cr4, %0" : "=r" (get_cpu_info()->cr4) );
+
 setup_virtual_regions(__start___ex_table, __stop___ex_table);
 
 /* Full exception support from here on in. */
@@ -706,9 +709,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 parse_video_info();
 
-rdmsrl(MSR_EFER, this_cpu(efer));
-asm volatile ( "mov %%cr4,%0" : "=r" (get_cpu_info()->cr4) );
-
 /* We initialise the serial devices very early so we can get debugging. */
 ns16550.io_base = 0x3f8;
 ns16550.irq = 4;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] incorrect goto in gnttab_setup_table overdecrements the preemption counter

2017-11-29 Thread Andrew Cooper

On 29/11/17 14:23, Jann Horn wrote:
> gnttab_setup_table() has the following code:
>
> =
> static long
> gnttab_setup_table(
> XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
> {
> struct gnttab_setup_table op;
> struct domain *d;
> struct grant_table *gt;
> inti;
> xen_pfn_t  gmfn;
>
> [...]
>
> d = rcu_lock_domain_by_any_id(op.dom);
> if ( d == NULL )
> {
> gdprintk(XENLOG_INFO, "Bad domid %d.\n", op.dom);
> op.status = GNTST_bad_domain;
> goto out2;
> }
>
> [...]
>  out2:
> rcu_unlock_domain(d);
>  out1:
> if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
> return -EFAULT;
>
> return 0;
> }
> =
> 
>
> This results in the following crash in a debug build of Xen 4.9.1:

Thanks for the report.

This was fixed in master by
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=5e436e7a45082ea2cadc176c19e1df46c178448f
but it looks like its not been backported to older releases.

Jan: Thoughts?  This isn't a security issue, but it would be better if
the stable trees had fewer asserts which could be hit.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] incorrect goto in gnttab_setup_table overdecrements the preemption counter

2017-11-29 Thread Andrew Cooper

On 29/11/17 14:34, Jann Horn wrote:
> On Wed, Nov 29, 2017 at 3:32 PM, Andrew Cooper
>  wrote:
>> On 29/11/17 14:23, Jann Horn wrote:
>>> gnttab_setup_table() has the following code:
>>>
>>> =
>>> static long
>>> gnttab_setup_table(
>>> XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
>>> {
>>> struct gnttab_setup_table op;
>>> struct domain *d;
>>> struct grant_table *gt;
>>> inti;
>>> xen_pfn_t  gmfn;
>>>
>>> [...]
>>>
>>> d = rcu_lock_domain_by_any_id(op.dom);
>>> if ( d == NULL )
>>> {
>>> gdprintk(XENLOG_INFO, "Bad domid %d.\n", op.dom);
>>> op.status = GNTST_bad_domain;
>>> goto out2;
>>> }
>>>
>>> [...]
>>>  out2:
>>> rcu_unlock_domain(d);
>>>  out1:
>>> if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
>>> return -EFAULT;
>>>
>>> return 0;
>>> }
>>> =
>>> 
>>>
>>> This results in the following crash in a debug build of Xen 4.9.1:
>> Thanks for the report.
>>
>> This was fixed in master by
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=5e436e7a45082ea2cadc176c19e1df46c178448f
>> but it looks like its not been backported to older releases.
> Urgh. I guess I really ought to fuzz master, not releases.

Actually, at this point it would be particularly helpful, as we are just
coming up to the 4.10 release.

The staging branch is slightly ahead of master at the moment (pending
completion of tests), and contains the fixes for the XSAs released
yesterday.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point

2017-11-29 Thread Andrew Cooper

On 29/11/17 14:47, Juergen Gross wrote:
> On 29/11/17 15:44, Paolo Bonzini wrote:
>> On 29/11/2017 15:25, Boris Ostrovsky wrote:
>> zeropage is x86/Linux-specific so we'd need some sort of firmware (like
>> grub) between a hypervisor and Linux to convert hvm_start_info to
>> bootparams.
> qemu?
>>> I think KVM folks didn't want to do this. I can't find the thread but I
>>> believe it was somewhere during Clear Containers discussion. Paolo?
>> QEMU is the right place to parse the ELF file and save it in memory.
>> You would have to teach QEMU to find the Xen note in ELF-format kernels
>> (just like it looks for the multiboot header), and use a different
>> option ROM ("pvhboot.c" for example).
>>
>> However I don't like to bypass the BIOS; for -kernel, KVM starts the
>> guest with an option ROM (linuxboot-dma.c or multiboot.S in QEMU
>> sources) that takes care of boot.
>>
>> In either case, you would have a new option ROM.  It could either be
>> very simple and similar to multiboot.S, or it could be larger and do the
>> same task as xen-pvh.S and enlighten_pvh.c (then get the address of
>> startup_32 or startup_64 from FW_CFG_KERNEL_ENTRY and jump there).  The
>> ugly part is that the option ROM would have to know more details about
>> what it is going to boot, including for example whether it's 32-bit or
>> 64-bit, so I don't really think it is a good idea.
> As grub2 doesn't have to know, qemu shouldn't have to know either.

An underlying requirement for this boot protocol was to remove the
requirement for a priori knowledge of the eventual mode of the guest,
which plagues Xen PV guests.  (One way or another, we need to parse the
kernel which will end up running to work out how to build the domain for
it.)

32bit flat mode is easy to set up, sufficiently large for any reasonable
bootstrapping, and provides no restrictions to what the eventual guest
wants to do.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: drop bogus assertion

2017-11-30 Thread Andrew Cooper

On 30/11/17 09:10, Jan Beulich wrote:
> Olaf has observed this assertion to trigger after an aborted migration
> of a PV guest (it remains to be determined why there is a page fault in
> the first place here:
>
> (XEN) Xen call trace:
> (XEN)[] do_page_fault+0x39f/0x55c
> (XEN)[] x86_64/entry.S#handle_exception_saved+0x66/0xa4
> (XEN)[] __copy_to_user_ll+0x22/0x30
> (XEN)[] update_runstate_area+0x19c/0x228
> (XEN)[] domain.c#_update_runstate_area+0x11/0x39
> (XEN)[] context_switch+0x1fd/0xf25
> (XEN)[] schedule.c#schedule+0x303/0x6a8
> (XEN)[] softirq.c#__do_softirq+0x6c/0x95
> (XEN)[] do_softirq+0x13/0x15
> (XEN)[] x86_64/entry.S#process_softirqs+0x21/0x30
>
> i.e. the guest specified a runstate area address which has a non-present
> mapping in the page tables [EC=0002 CR2=88003d405220], but that's
> not something the hypervisor needs to be concerned about.) Release
> builds work fine, which is a first indication that the assertion isn't
> really needed.
>
> What's worse though - there appears to be a timing window where the
> guest runs in shadow mode, but not in log-dirty mode, and that is what
> triggers the assertion (the same could, afaict, be achieved by test-
> enabling shadow mode on a PV guest). This is because turing off log-
> dirty mode is being performed in two steps: First the log-dirty bit gets
> cleared (paging_log_dirty_disable() [having paused the domain] ->
> sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
> unpausing the domain and only then clearing shadow mode (via
> shadow_test_disable(), which pauses the domain a second time).
>
> Hence besides removing the ASSERT() here (or optionally replacing it by
> explicit translate and refcounts mode checks, but this seems rather
> pointless now that the three are tied together) I wonder whether either
> shadow_one_bit_disable() should turn off shadow mode if no other bit
> besides PG_SH_enable remains set (just like shadow_one_bit_enable()
> enables it if not already set), or the domain pausing scope should be
> extended so that both steps occur without the domain getting a chance to
> run in between.
>
> Reported-by: Olaf Hering 
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1338,12 +1338,8 @@ static int fixup_page_fault(unsigned lon
>   */
>  if ( paging_mode_enabled(d) && !paging_mode_external(d) )
>  {
> -int ret;
> +int ret = paging_fault(addr, regs);
>  
> -/* Logdirty mode is the only expected paging mode for PV guests. */
> -ASSERT(paging_mode_only_log_dirty(d));

This assert was introduced for a specific reason, as a result of
d5c251c22b7, although the later bugfix 7b9f21cabc complicated things
somewhat.

The observation about shadow/logdirty happening in two phases does
invalidate the check in this form.  Its original purpose was to check
that we hadn't slipped into PV autotranslate mode, and I think that is
still worth keeping around.

How about reducing it to just a simple ASSERT(!paging_mode_translate(d)) ?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 for-4.10] x86: Avoid corruption on migrate for vcpus using CPUID Faulting

2017-11-30 Thread Andrew Cooper

On 27/11/17 14:41, Jan Beulich wrote:
>>>> On 27.11.17 at 14:02,  wrote:
>> Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, 
>> the
>> value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
>> that the current cpuid faulting setting is lost on migrate/suspend/resume.
>>
>> To move this MSR, use the new guest_{rd,wr}msr() infrastructure.  This avoids
>> duplicating or opencoding the feature check and value logic, as well as
>> abstracting away the internal value representation.  One small adjustment to
>> guest_wrmsr() is required to cope with being called in toolstack context.
>>
>> Signed-off-by: Andrew Cooper 
> With the further intentions mentioned in the description (as a
> justification for some of the earlier requested changes to not
> be done), as indicated in a late response to v1
> Reviewed-by: Jan Beulich 

I thought that was already clear from the second paragraph.  Either way,
how about this?

Xen 4.8 and later virtualises CPUID Faulting support for guests. 
However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

Instead of following the MSR status quo, take the opportunity to make the
logic more generic, and in particular, trivial to extend for future MSRs.

This is done by discarding the notion of optional MSRs, and requiring the
toolstack to be prepared to move all of the MSRs, although only a subset
will
typically need to move.

This allows for the use of guest_{rd,wr}msr() alone to evaluate whether
an MSR
needs moving.  This is a benefit because it means there is a single piece of
logic responsible for evaluating whether a guest can use an MSR, and which
values are acceptable.

One small adjustment to guest_wrmsr() is required to cope with being
called in
toolstack context.

Signed-off-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/hvm: fix interaction between internal and external emulation

2017-12-01 Thread Andrew Cooper

On 01/12/17 16:55, Jan Beulich wrote:
>>>> Julien Grall  12/01/17 5:14 PM >>>
>> On 30/11/17 14:28, Jan Beulich wrote:
>>>>>> On 28.11.17 at 15:05,  wrote:
>>>> A call to handle_hvm_io_completion() is needed for completing I/O
>>>> that requires external emulation. Such completion should be requested when
>>>> hvm_vcpu_io_need_completion() returns true after hvm_emulate_once() has
>>>> completed. This is indicative of the underlying I/O emulation having
>>>> returned X86EMUL_RETRY and hence a re-emulation of the instruction is
>>>> needed to pick up the result of the I/O.
>>>>
>>>> A call to handle_hvm_io_completion() is NOT needed when the underlying
>>>> I/O has not returned X86EMUL_RETRY since there will be no result to pick
>>>> up. Hence it bogus to request such completion when mmio_retry is set,
>>>> since this can only happen if the underlying I/O emulation has returned
>>>> X86EMUL_OKAY (meaning the I/O has completed successfully).
>>>>
>>>> Reported-by: Andrew Cooper 
>>>> Signed-off-by: Paul Durrant 
>>>> Reviewed-by: Jan Beulich 
>>> Hmm, I notice Paul didn't Cc you on this one - despite it getting late,
>>> this is still something to be considered for 4.10. It's certainly going
>>> to be a backporting candidate.
>> Release-acked-by: Julien Grall 
> Thanks.
>
>> Could this be committed today?
> Not by me; I'm not in the office anymore. Perhaps Andrew could, together with
> the other (his) one you've sent an ack for.

I'll get these sorted, and put onto the 4.10 branch.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 7/9] x86/vvmx: Use correct sizes when reading operands

2017-12-01 Thread Andrew Cooper

On 28/11/17 08:32, Jan Beulich wrote:
 On 26.10.17 at 19:03,  wrote:
>> * invvpid has a 128-bit memory operand but we only require the VPID value
>>   which lies in the lower 64 bits.
> The memory operand (wrongly) isn't being read at all - I don't
> understand the above bullet point for that reason.
>
>> @@ -464,6 +462,8 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
>>  unsigned long base, index, seg_base, disp, offset;
>>  int scale, size;
>>  
>> +unsigned int bytes = vmx_guest_x86_mode(v);
>> +
> Except in extraordinary circumstances please don't add blank lines in
> the middle of declarations.
>
> Also - don't you get 2 here for 16-bit protected mode, which you'd
> need to convert to 4?

We can never reach this point from a 16 bit segment.  All VMX
instructions #UD if executed outside of a 64bit (in long mode) or 32bit
(outside of long mode) segment.

>
>> @@ -1801,7 +1803,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
>>  gpa = nvcpu->nv_vvmcxaddr;
>>  
>>  rc = hvm_copy_to_guest_linear(decode.op[0].mem, &gpa,
>> -  decode.op[0].len, 0, &pfinfo);
>> +  decode.op[0].bytes, 0, &pfinfo);
> Just like for vmptrld the operand size is uniformly 64 bits here.
>
>> @@ -1987,9 +1989,9 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
>>  {
>>  case INVEPT_SINGLE_CONTEXT:
>>  {
>> -unsigned long eptp;
>> +uint64_t eptp;
>>  
>> -ret = operand_read(&eptp, &decode.op[0], regs, decode.op[0].len);
>> +ret = operand_read(&eptp, &decode.op[0], regs, sizeof(eptp));
> I think you should read the full 128 bits for correct faulting behavior
> (also for invvpid). I also can't derive from the SDM that this reading
> won't occur for the INVEPT_ALL_CONTEXT case. Sadly the SDM
> doesn't say whether the reserved upper half is enforced to be zero
> (which we would then need to emulate), or whether it is being
> ignored. For invvpid however it does state that bits 16:63 are being
> checked.

Observations on real hardware to the contrary.  Each of the generations
I've tried never read the operand, or the part of the operand that they
don't need.

It makes sense from a performance point of view.  No point having
microcode make unnecessary memory accesses.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC] WIP: optee: add OP-TEE mediator

2017-12-04 Thread Andrew Cooper

On 01/12/17 22:58, Stefano Stabellini wrote:
>
> = Xen command forwarding =
>
> In the code below, it looks like Xen is forwarding everything to OP-TEE.
> Are there some commands Xen should avoid forwarding? Should we have a
> whitelist or a blacklist?

Whitelist everything.

At the very minimum, it means that patches to add to the whitelist get a
chance for review.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping#2: Re: [PATCH 2/2] x86: don't allow clearing of TF_kernel_mode for other than 64-bit PV

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:15, Jan Beulich wrote:
 On 03.07.17 at 16:56,  wrote:
> On 31.05.17 at 13:54,  wrote:
>> On 31.05.17 at 13:08,  wrote:
 On 31/05/17 08:15, Jan Beulich wrote:
> The flag is really only meant for those, both HVM and 32-bit PV tell
> kernel from user mode based on CPL/RPL. Remove the all-question-marks
> comment and let's be on the safe side here and also suppress clearing
> for 32-bit PV (this isn't a fast path after all).
>
> Signed-off-by: Jan Beulich 
 Wouldn't it just be safer to disallow starting a 64bit PV guest in user
 mode?

 No real kernel would do such a thing, and keeping the corner case around
 is bad from an attack-surface point of view.
>>> If it really was "starting a guest", I would probably agree. But we're
>>> talking about starting a vCPU, and I could see uses for this (not the
>>> least in XTF). After all the operation allows for enough state to be
>>> set up such that further initialization inside the guest may not be
>>> necessary.
>> Any opinion here, or change of opinion on the original patch?
> I'd really like to get this off my list.

My opinion is unchanged.  This isn't a useful piece of functionality,
and it definitely doesn't warrant the attack surface it brings.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: make get_page_from_mfn() return struct page_info *

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:44, Jan Beulich wrote:
> Almost all users of it want it, and it calculates it anyway.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/5] x86: remove _PAGE_PSE check from get_page_from_l2e()

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:44, Jan Beulich wrote:
> With L2_DISALLOW_MASK containing _PAGE_PSE unconditionally as of commit
> 56fff3e5e9 ("x86: nuke PV superpage option and code") there's no point
> anymore in separately checking for the bit.
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1106,15 +1106,10 @@ get_page_from_l2e(
>  return -EINVAL;
>  }
>  
> -if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
> -{
> -rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0, 
> 0);
> -if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) 
> )
> -rc = 0;
> -return rc;
> -}
> -
> -return -EINVAL;
> +rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0, 0);
> +if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
> +rc = 0;

Newline here.  Otherwise, Reviewed-by: Andrew Cooper


> +return rc;
>  }
>  
>  
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86: improve _put_page_type() readability

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:45, Jan Beulich wrote:
> @@ -2477,10 +2478,9 @@ static int _put_page_type(struct page_in
>  continue;
>  /* We cleared the 'valid bit' so we do the clean up. */
>  rc = _put_final_page_type(page, x, preemptible, ptpg);
> -ptpg = NULL;
>  if ( x & PGT_partial )
>  put_page(page);
> -break;

Newline here.  Otherwise, Reviewed-by: Andrew Cooper


> +return rc;
>  }
>  
>  if ( !ptpg || !PGT_type_equal(x, ptpg->u.inuse.type_info) )
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] x86: use switch() in _put_page_type()

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:46, Jan Beulich wrote:
> Use this to cheaply add another assertion.
>
> Signed-off-by: Jan Beulich 
> ---
> TBD: Would it perhaps be better to return after the assertion?

Yes, otherwise we risk falling into an infinite continue loop.

With a suitable return value, Reviewed-by: Andrew Cooper


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] x86: make _get_page_type() a proper counterpart of _put_page_type() again

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:46, Jan Beulich wrote:
> @@ -2709,7 +2710,7 @@ int put_page_type_preemptible(struct pag
>  int get_page_type_preemptible(struct page_info *page, unsigned long type)
>  {
>  ASSERT(!current->arch.old_guest_table);

Newline.  Otherwise, Reviewed-by: Andrew Cooper 

> -return __get_page_type(page, type, 1);
> +return _get_page_type(page, type, true);
>  }
>  
>  int put_old_guest_table(struct vcpu *v)
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] x86/PoD: correctly handle non-order-0 decrease-reservation requests

2017-12-04 Thread Andrew Cooper

On 04/12/17 11:06, Jan Beulich wrote:
> p2m_pod_decrease_reservation() returning just (not) all-done is not

This would be easier to parse as "returning only all-done is not"

> sufficient for the caller: If some pages were processed,
> guest_remove_page() returning an error for those pages is the expected
> result rather than an indication of a problem. Make guest_remove_page()
> return a distinct error code for this very case, and special case
> handling in case of seeing this error code in decrease_reservation().
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -393,10 +393,10 @@ int guest_physmap_mark_populate_on_deman
>  return -ENOSYS;
>  }
>  
> -int p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn,
> - unsigned int order)
> +unsigned long p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn,
> +   unsigned int order)
>  {
> -return -ENOSYS;
> +return 0;
>  }
>  
>  static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a)
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -510,11 +510,10 @@ p2m_pod_zero_check_superpage(struct p2m_
>   * Once both of these functions have been completed, we can return and
>   * allow decrease_reservation() to handle everything else.
>   */
> -int
> +unsigned long
>  p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
>  {
> -int ret = 0;
> -unsigned long i, n;
> +unsigned long ret = 0, i, n;
>  struct p2m_domain *p2m = p2m_get_hostp2m(d);
>  bool_t steal_for_cache;
>  long pod, nonpod, ram;
> @@ -577,9 +576,9 @@ p2m_pod_decrease_reservation(struct doma
>  domain_crash(d);
>  goto out_unlock;
>  }
> -p2m->pod.entry_count -= 1UL << order;
> +ret = 1UL << order;
> +p2m->pod.entry_count -= ret;
>  BUG_ON(p2m->pod.entry_count < 0);
> -ret = 1;
>  goto out_entry_check;
>  }
>  
> @@ -630,6 +629,7 @@ p2m_pod_decrease_reservation(struct doma
>  p2m->pod.entry_count -= n;
>  BUG_ON(p2m->pod.entry_count < 0);
>  pod -= n;
> +ret += n;
>  }
>  else if ( steal_for_cache && p2m_is_ram(t) )
>  {
> @@ -664,16 +664,10 @@ p2m_pod_decrease_reservation(struct doma
>  
>  nonpod -= n;
>  ram -= n;
> +ret += n;
>  }
>  }
>  
> -/*
> - * If there are no more non-PoD entries, tell decrease_reservation() that
> - * there's nothing left to do.
> - */
> -if ( nonpod == 0 )
> -ret = 1;
> -
>  out_entry_check:
>  /* If we've reduced our "liabilities" beyond our "assets", free some */
>  if ( p2m->pod.entry_count < p2m->pod.count )
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -284,13 +284,15 @@ int guest_remove_page(struct domain *d,
>  
>  #ifdef CONFIG_X86
>  mfn = get_gfn_query(d, gmfn, &p2mt);
> +if ( unlikely(p2mt == p2m_invalid) || unlikely(p2mt == p2m_mmio_dm) )
> +return -ENOENT;

Newline.

>  if ( unlikely(p2m_is_paging(p2mt)) )
>  {
>  rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);

Somewhere in this callchain, you truncate unsigned long to int.  It is
ok (I think) at the moment because ORDER_1G fits within int, but is
liable to break subtly in the future.

> -put_gfn(d, gmfn);
> -
>  if ( rc )
> -return rc;
> +goto out_put_gfn;
> +
> +put_gfn(d, gmfn);
>  
>  /* If the page hasn't yet been paged out, there is an
>   * actual page that needs to be released. */
> @@ -308,9 +310,7 @@ int guest_remove_page(struct domain *d,
>  if ( p2mt == p2m_mmio_direct )
>  {
>  rc = clear_mmio_p2m_entry(d, gmfn, mfn, PAGE_ORDER_4K);
> -put_gfn(d, gmfn);
> -
> -return rc;
> +goto out_put_gfn;
>  }
>  #else
>  mfn = gfn_to_mfn(d, _gfn(gmfn));
> @@ -335,10 +335,8 @@ int guest_remove_page(struct domain *d,
>  rc = mem_sharing_unshare_page(d, gmfn, 0);
>  if ( rc )
>  {
> -put_gfn(d, gmfn);
>  (void)mem_sharing_notify_enomem(d, gmfn, 0);
> -
> -return rc;
> +goto out_put_gfn;
>  }
>  /* Maybe the mfn changed */
>  mfn = get_gfn_query_unlocked(d, gmfn, &p2mt);
> @@ -375,9 +373,10 @@ int guest_remove_page(struct domain *d,
>  put_page(page);
>  
>  put_page(page);
> + out_put_gfn: __maybe_unused

What is this annotation for?

~Andrew

>  put_gfn(d, gmfn);
>  
> -return rc;
> +return rc != -ENOENT ? rc : -EINVAL;
>  }
>  
>  static void decrease_reservation(struct memop_args *a)
> @@ -392,6 +391,8 @@ static void decrease_reservation(struct
>  
>  for ( i = a->nr_done; i < a->nr_extents; i++ )
>  {
> +unsigned long pod_done;
> +
>  if ( i != a

Re: [Xen-devel] [PATCH 2/3] x86/mm: drop yet another relic of translated PV domains from new_guest_cr3()

2017-12-04 Thread Andrew Cooper

On 04/12/17 11:06, Jan Beulich wrote:
> The function can be called for PV domains only, which commit 5a0b9fba92
> ("x86/mm: drop further relics of translated PV domains") sort of
> realized, but not fully.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/p2m: force return value checking of p2m_set_entry()

2017-12-04 Thread Andrew Cooper

On 04/12/17 11:07, Jan Beulich wrote:
> As XSAs 246 and 247 have shown, not doing so is rather dangerous.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] x86/IRQ: conditionally preserve access permission on map error paths

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:32, Jan Beulich wrote:
> Permissions that had been granted before should not be revoked when
> handling unrelated errors.
>
> Reported-by: HW42 
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1918,6 +1918,7 @@ int map_domain_pirq(
>  struct irq_desc *desc;
>  unsigned long flags;
>  DECLARE_BITMAP(prepared, MAX_MSI_IRQS) = {};
> +DECLARE_BITMAP(granted, MAX_MSI_IRQS) = {};
>  
>  ASSERT(spin_is_locked(&d->event_lock));
>  
> @@ -1951,13 +1952,17 @@ int map_domain_pirq(
>  return ret;
>  }
>  
> -ret = irq_permit_access(d, irq);
> -if ( ret )
> +if ( likely(!irq_access_permitted(d, irq)) )
>  {
> -printk(XENLOG_G_ERR
> -   "dom%d: could not permit access to IRQ%d (pirq %d)\n",
> -   d->domain_id, irq, pirq);
> -return ret;
> +ret = irq_permit_access(d, irq);
> +if ( ret )
> +{
> +printk(XENLOG_G_ERR
> +   "dom%d: could not permit access to IRQ%d (pirq %d)\n",
> +  d->domain_id, irq, pirq);
> +return ret;
> +}
> +__set_bit(0, granted);
>  }
>  
>  ret = prepare_domain_irq_pirq(d, irq, pirq, &info);
> @@ -2042,10 +2047,15 @@ int map_domain_pirq(
>  __set_bit(nr, prepared);
>  msi_desc[nr].irq = irq;
>  
> -if ( irq_permit_access(d, irq) != 0 )
> -printk(XENLOG_G_WARNING
> -   "dom%d: could not permit access to IRQ%d (pirq %d)\n",
> -   d->domain_id, irq, pirq);
> +if ( likely(!irq_access_permitted(d, irq)) )
> +{
> +if ( irq_permit_access(d, irq) )
> +printk(XENLOG_G_WARNING
> +   "dom%d: could not permit access to IRQ%d (pirq 
> %d)\n",
> +   d->domain_id, irq, pirq);
> +else
> +__set_bit(0, granted);
> +}
>  
>  desc = irq_to_desc(irq);
>  spin_lock_irqsave(&desc->lock, flags);
> @@ -2074,7 +2084,8 @@ int map_domain_pirq(
>  }
>  while ( nr )
>  {
> -if ( irq >= 0 && irq_deny_access(d, irq) )
> +if ( irq >= 0 && test_bit(nr, granted) &&

You only ever set bit 0 of granted, but you test each of them here. 
Something seems wrong.

Should the previous hunk be __set_bit(nr, granted) ?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86/MSI: leverage local variables

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:33, Jan Beulich wrote:
> ... instead of using redundant calculations.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

Now that pci_sbdf_t has been introduced, I should dust off my patch for %pP

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86: tighten MMU_PT_UPDATE check and combine error paths

2017-12-04 Thread Andrew Cooper

On 12/10/17 13:14, Jan Beulich wrote:
>>>> On 12.10.17 at 13:31,  wrote:
>> On 12/10/17 11:01, Jan Beulich wrote:
>>> Don't accept anything other than r/w RAM pages and move the paged-out
>>> check into the (unlikely) error path following that check.
>>>
>>> Signed-off-by: Jan Beulich 
>> How does dom0 boot with this change in place?  You appear to have
>> prohibited mapping MMIO frames.
> The page in question is a page table one, which can't be MMIO.
> Dom0 is booting fine.

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/3] x86: don't ignore foreigndom on L2/L3/L4 page table updates

2017-12-04 Thread Andrew Cooper

On 12/10/17 13:24, Jan Beulich wrote:
> Silently assuming DOMID_SELF is unlikely to be a good idea for page
> table updates. For PGT_writable pages, though, it seems better to allow
> the writes, so the same check isn't being applied there.
>
> Also add blank lines between the individual case blocks.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping#3: [PATCH v3] x86/HVM: don't #GP/#SS on wrapping virt->linear translations

2017-12-04 Thread Andrew Cooper

On 04/12/17 10:16, Jan Beulich wrote:
 On 25.08.17 at 16:59,  wrote:
> On 10.08.17 at 09:19,  wrote:
>> On 10.07.17 at 12:39,  wrote:
 Real hardware wraps silently in most cases, so we should behave the
 same. Also split real and VM86 mode handling, as the latter really
 ought to have limit checks applied.

 Signed-off-by: Jan Beulich 
 ---
 v3: Restore 32-bit wrap check for AMD.
 v2: Extend to non-64-bit modes. Reduce 64-bit check to a single
 is_canonical_address() invocation.
> Same here - I think I've been carrying this for long enough.

I'm not sure what to say.  I'm not comfortable taking this change
without a regression test in place, which also serves to demonstrate the
correctness of the change.

Its simply a matter of time, not any other objection to the change.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL

2017-12-05 Thread Andrew Cooper

On 05/12/2017 09:30, Jan Beulich wrote:
 On 05.12.17 at 09:49,  wrote:
>> flight 116832 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/116832/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install  fail REGR. vs. 
>> 116744
> This is a blue screen, recurring, and has first been reported in flight
> 116779, i.e. was likely introduced in the batch ending in commit
> 4cd0fad645. Among those the most likely candidates appear to be
> the SVM changes (the failures are all on AMD hardware). The logs
> there also have huge amounts of "Unexpected nested vmexit",
> albeit not directly connected with the failed test afaict.

The unexpected nested vmexit is from a previous test, (and hopefully the
nested virt test, as that path shouldn't be reachable elsehow).

The windows boot which actually failed has:

Dec  5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ 
0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc
Dec  5 04:21:49.555130 (XEN) stdvga.c:173:d1v0 entering stdvga mode

which I expect is the root of the problem.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL

2017-12-05 Thread Andrew Cooper

On 05/12/2017 10:03, Andrew Cooper wrote:
> On 05/12/2017 09:30, Jan Beulich wrote:
>>>>> On 05.12.17 at 09:49,  wrote:
>>> flight 116832 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/116832/ 
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install  fail REGR. vs. 
>>> 116744
>> This is a blue screen, recurring, and has first been reported in flight
>> 116779, i.e. was likely introduced in the batch ending in commit
>> 4cd0fad645. Among those the most likely candidates appear to be
>> the SVM changes (the failures are all on AMD hardware). The logs
>> there also have huge amounts of "Unexpected nested vmexit",
>> albeit not directly connected with the failed test afaict.
> The unexpected nested vmexit is from a previous test, (and hopefully the
> nested virt test, as that path shouldn't be reachable elsehow).
>
> The windows boot which actually failed has:
>
> Dec  5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ 
> 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc
> Dec  5 04:21:49.555130 (XEN) stdvga.c:173:d1v0 entering stdvga mode
>
> which I expect is the root of the problem.

Yes - that is the cause of the problem.  The BSOD is a 0x3D (exception
not handled) referencing the same %rip as the CR emulation failure.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL

2017-12-05 Thread Andrew Cooper

On 05/12/17 11:16, Jan Beulich wrote:
 On 05.12.17 at 11:03,  wrote:
>> On 05/12/2017 09:30, Jan Beulich wrote:
>> On 05.12.17 at 09:49,  wrote:
 flight 116832 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/116832/ 

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install  fail REGR. vs. 
>> 116744
>>> This is a blue screen, recurring, and has first been reported in flight
>>> 116779, i.e. was likely introduced in the batch ending in commit
>>> 4cd0fad645. Among those the most likely candidates appear to be
>>> the SVM changes (the failures are all on AMD hardware). The logs
>>> there also have huge amounts of "Unexpected nested vmexit",
>>> albeit not directly connected with the failed test afaict.
>> The unexpected nested vmexit is from a previous test, (and hopefully the
>> nested virt test, as that path shouldn't be reachable elsehow).
>>
>> The windows boot which actually failed has:
>>
>> Dec  5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ 
>> 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc
> How did I not spot this? This is a REP INSW, which then makes it
> far more likely to be a result of Paul's 9c9384d6d8. Sadly the
> windows-install tests of the two 4.10 flights we've had so far all
> ran on Intel hardware, so we can't easily tell whether the
> problem is present there as well (in which case it would for sure
> be that commit).
>
> For the moment I'm struggling to understand how we can end up
> on the CR access emulation path here, but I'll take a closer look.

I am equally perplexed.  The SVM code definitely used to (ab)use the
MMIO path, so I expect there is still some remnants left.

The CR access in the message proves that we started this emulation from
a CR vmexit.  My best guess is that _hvm_emulate_one() reused the
instruction cache rather than starting fresh.  The unhandleable is
probably from a ->validate() failure, and the bytes are probably stale
from the previous emulation.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]

2017-12-05 Thread Andrew Cooper

On 05/12/17 15:31, Jan Beulich wrote:
 On 05.12.17 at 16:05,  wrote:
>> Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions 
>> - 
>> FAIL"):
>>> This is a blue screen, recurring, and has first been reported in flight
>>> 116779, i.e. was likely introduced in the batch ending in commit
>>> 4cd0fad645. Among those the most likely candidates appear to be
>>> the SVM changes (the failures are all on AMD hardware). The logs
>>> there also have huge amounts of "Unexpected nested vmexit",
>>> albeit not directly connected with the failed test afaict.
>> Ian Jackson writes ("Re: [xen-unstable test] 116832: regressions - FAIL"):
>>> This is the expected Windows failure.  Force pushed.
>> Oops.  Sorry about that.
>>
>> I think this goes to show that (i) leaving known failures languishing
>> for months and expecting them to be force pushed results in human
>> error (ii) I should read the whole email thread first.
> Oh, that's pretty unfortunate. I think we'll then need a custom flight
> tied to the box that this failure occurred on, to have a way to tell
> whether the fix I'm about to prepare has actually helped, the more
> that the same issue is presumably also present on the 4.10 branch.
> Thing is that newer AMD hardware (with decode assist) doesn't
> appear to demonstrate the misbehavior, and for some reason it also
> doesn't show on Intel systems.
>
> I've spent quite a bit of time to repro this on my old AMD box, but the
> distro on there is just too old to be able to start a suitable Windows
> guest (part(?) of the reason being that scripts in /etc/xen/scripts
> appear to get invoked alongside the ones from the separate unstable
> install tree, and at some point I then decided to give up trying to hack
> things up so they would work together again).

If you've got a provisional patch, I can get some testing organised on
newer and older hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: don't retain emulated insn cache when exiting back to guest

2017-12-06 Thread Andrew Cooper

On 06/12/17 13:04, Julien Grall wrote:
> Hi Jan,
>
> On 12/06/2017 12:58 PM, Jan Beulich wrote:
>>>>> On 06.12.17 at 12:47,  wrote:
>>> On 12/06/2017 11:45 AM, Jan Beulich wrote:
>>>>>>> On 06.12.17 at 10:47,  wrote:
>>>>> I guess I have been CCed because you would like this patch is
>>>>> fixing the
>>>>> regression you mentioned on IRC?
>>>>
>>>> Yes, but first of all we need to see whether the issue goes away in
>>>> master once the patch is in.
>>>
>>> Would reverting the offending patch in Xen 4.10 be solution?
>>
>> Probably yes, at the price of re-introducing the issue that change
>> did fix. But reverting wouldn't feel right: The change, after all,
>> was not buggy - it merely uncovered the other issue, as far as we
>> can tell.
>
> I understand. I have seen you pushed the fixes in master today. Let
> see how it perform and decide tomorrow what to do.

XenServer testing has identified this bug, and shown the bug to be fixed
with this patch in place.

Therefore, a retroactive Tested-by: Andrew Cooper


No other emulation issues seen running windows on this particular piece
of old AMD hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] domctl: improve locking during domain destruction

2017-12-06 Thread Andrew Cooper

On 06/12/17 15:31, Jan Beulich wrote:
> There is no need to hold the global domctl lock across domain_kill() -
> the domain lock is fully sufficient here, and parallel cleanup after
> multiple domains performs quite a bit better this way.
>
> Signed-off-by: Jan Beulich 

This is clearer.  Reviewed-by: Andrew Cooper 

> ---
> Changes since RFC: Comment added.
> ---
> Obviously other domctl-s could benefit from similar adjustments, so
> this is meant to be just a start.
>
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -615,13 +615,21 @@ int domain_kill(struct domain *d)
>  if ( d == current->domain )
>  return -EINVAL;
>  
> -/* Protected by domctl_lock. */
> +/* Protected by d->domain_lock. */
>  switch ( d->is_dying )
>  {
>  case DOMDYING_alive:
> +domain_unlock(d);
>  domain_pause(d);
> +domain_lock(d);
> +/*
> + * With the domain lock dropped, d->is_dying may have changed. Call
> + * ourselves recursively if so, which is safe as then we won't come
> + * back here.
> + */
> +if ( d->is_dying != DOMDYING_alive )
> +return domain_kill(d);
>  d->is_dying = DOMDYING_dying;
> -spin_barrier(&d->domain_lock);
>  evtchn_destroy(d);
>  gnttab_release_mappings(d);
>  tmem_destroy(d->tmem_client);
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -665,11 +665,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
>  break;
>  
>  case XEN_DOMCTL_destroydomain:
> +domctl_lock_release();
> +domain_lock(d);
>  ret = domain_kill(d);
> +domain_unlock(d);
>  if ( ret == -ERESTART )
>  ret = hypercall_create_continuation(
>  __HYPERVISOR_domctl, "h", u_domctl);
> -break;
> +goto domctl_out_unlock_domonly;
>  
>  case XEN_DOMCTL_setnodeaffinity:
>  {
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] pdx: correct indentation

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:19, Jan Beulich wrote:
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm: don't use domain_shutdown() when re-offlining a page

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:20, Jan Beulich wrote:
> It goes all silent, leaving open what has actually caused the crash.
> Use domain_crash() instead, which leaves a log message before calling
> domain_shutdown(..., SHUTDOWN_crash).
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenmem_add_to_physmap_one() has no need to know of XENMAPSPACE_gmfn_range

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:21, Jan Beulich wrote:
> As its name says, it handles a single GMFN only anyway. Note that ARM
> needs no adjustment, as it doesn't handle the two types at all.
>
> Also take the opportunity and clean up the handling of XENMAPSPACE_gmfn
> a little: There's no point in going through "idx" when capturing the MFN.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

This looks to be based on some of your other mm cleanup, which isn't yet
in staging.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] simplify xenmem_add_to_physmap_batch()

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:21, Jan Beulich wrote:
> There's no need for
> - advancing the handles and at the same time using
>   __copy_{from,to}_guest_offset(),
> - an "out" label,
> - local variables "done" and (function scope) "rc".
>
> Signed-off-by: Jan Beulich 

These changes do mean that start is no longer ideally named.  How about
s/start/extent/, which also matches up better with the parameter name
passed in by its caller?

~Andrew

>
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -820,65 +820,37 @@ static int xenmem_add_to_physmap_batch(s
> struct xen_add_to_physmap_batch 
> *xatpb,
> unsigned int start)
>  {
> -unsigned int done = 0;
> -int rc;
> -
>  if ( xatpb->size < start )
>  return -EILSEQ;
>  
> -guest_handle_add_offset(xatpb->idxs, start);
> -guest_handle_add_offset(xatpb->gpfns, start);
> -guest_handle_add_offset(xatpb->errs, start);
> -xatpb->size -= start;
> -
> -if ( !guest_handle_okay(xatpb->idxs, xatpb->size) ||
> - !guest_handle_okay(xatpb->gpfns, xatpb->size) ||
> - !guest_handle_okay(xatpb->errs, xatpb->size) )
> +if ( !guest_handle_subrange_okay(xatpb->idxs, start, xatpb->size - 1) ||
> + !guest_handle_subrange_okay(xatpb->gpfns, start, xatpb->size - 1) ||
> + !guest_handle_subrange_okay(xatpb->errs, start, xatpb->size - 1) )
>  return -EFAULT;
>  
> -while ( xatpb->size > done )
> +while ( xatpb->size > start )
>  {
>  xen_ulong_t idx;
>  xen_pfn_t gpfn;
> +int rc;
>  
> -if ( unlikely(__copy_from_guest_offset(&idx, xatpb->idxs, 0, 1)) )
> -{
> -rc = -EFAULT;
> -goto out;
> -}
> -
> -if ( unlikely(__copy_from_guest_offset(&gpfn, xatpb->gpfns, 0, 1)) )
> -{
> -rc = -EFAULT;
> -goto out;
> -}
> +if ( unlikely(__copy_from_guest_offset(&idx, xatpb->idxs, start, 1)) 
> ||
> + unlikely(__copy_from_guest_offset(&gpfn, xatpb->gpfns, start, 
> 1)) )
> +return -EFAULT;
>  
>  rc = xenmem_add_to_physmap_one(d, xatpb->space,
> xatpb->u,
> idx, _gfn(gpfn));
>  
> -if ( unlikely(__copy_to_guest_offset(xatpb->errs, 0, &rc, 1)) )
> -{
> -rc = -EFAULT;
> -goto out;
> -}
> -
> -guest_handle_add_offset(xatpb->idxs, 1);
> -guest_handle_add_offset(xatpb->gpfns, 1);
> -guest_handle_add_offset(xatpb->errs, 1);
> +if ( unlikely(__copy_to_guest_offset(xatpb->errs, start, &rc, 1)) )
> +return -EFAULT;
>  
>  /* Check for continuation if it's not the last iteration. */
> -if ( xatpb->size > ++done && hypercall_preempt_check() )
> -{
> -rc = start + done;
> -goto out;
> -}
> +if ( xatpb->size > ++start && hypercall_preempt_check() )
> +return start;
>  }
>  
> -rc = 0;
> -
> -out:
> -return rc;
> +return 0;
>  }
>  
>  static int construct_memop_from_reservation(
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] VMX: drop bogus gpa parameter from __invept()

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:28, Jan Beulich wrote:
> Perhaps there once was a plan to have a flush type requiring this, but
> the current SDM has no mention of such and all callers pass zero anyway.
>
> Take the opportunity and also change involved types to uint64_t.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: make explicit that hvm_print_line() does output only

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:27, Jan Beulich wrote:
> On input "c" being 0xff should already have the effect of bailing early
> (due to the isprint()), but let's rather make this explicit. Also
> convert the BUG_ON() to an ASSERT() (nothing fatal happens in the
> function if this is violated), at the same time extending what is being
> checked.
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -544,10 +544,14 @@ static int hvm_print_line(
>  struct domain *cd = current->domain;
>  char c = *val;
>  
> -BUG_ON(bytes != 1);
> +ASSERT(bytes == 1 || port == 0xe9);
>  
> -/* Accept only printable characters, newline, and horizontal tab. */
> -if ( !isprint(c) && (c != '\n') && (c != '\t') )
> +/*
> + * Ignore any input requests and accept only printable characters,
> + * newline, and horizontal tab.
> + */
> +if ( dir != IOREQ_WRITE ||
> + (!isprint(c) && (c != '\n') && (c != '\t')) )
>  return X86EMUL_OKAY;

Given that there is no functionality on the read side, it should be
explicitly terminated with ~0 rather than ignored.

Otherwise, Reviewed-by: Andrew Cooper 

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/intel: Drop zeroed-out select_idle_routine() function

2017-12-06 Thread Andrew Cooper

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/cpu/intel.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index ac932e5..d3145c0 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -15,8 +15,6 @@
 
 #include "cpu.h"
 
-#define select_idle_routine(x) ((void)0)
-
 static bool __init probe_intel_cpuid_faulting(void)
 {
uint64_t x;
@@ -375,7 +373,6 @@ static void init_intel(struct cpuinfo_x86 *c)
/* Detect the extended topology information if available */
detect_extended_topology(c);
 
-   select_idle_routine(c);
l2 = init_intel_cacheinfo(c);
if (c->cpuid_level > 9) {
unsigned eax = cpuid_eax(10);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86: improve MSR_SHADOW_GS accesses

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:37, Jan Beulich wrote:
> --- a/xen/include/asm-x86/msr.h
> +++ b/xen/include/asm-x86/msr.h
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -172,6 +173,24 @@ static inline unsigned long rdgsbase(voi
>  return base;
>  }
>  
> +static inline unsigned long rdgsshadow(void)
> +{
> +unsigned long base;
> +
> +alternative_io("mov %[msr], %%ecx\n\t"
> +   "rdmsr\n\t"
> +   "shl $32, %%rdx\n\t"
> +   "or %%rdx, %[res]",

There needs to be some clearer distinction between the two
alternatives.  It took a while for me to spot this comma.

> +   "swapgs\n\t"
> +   ".byte 0xf3, 0x48, 0x0f, 0xae, 0xc8\n\t" /* rdgsbase rax 
> */
> +   "swapgs",
> +   X86_FEATURE_FSGSBASE,
> +   [res] "=&a" (base),
> +   [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx");

MSR should be a "c" constraint, and the clobber dropped.  It will result
in better code in most of the callsites, by avoiding a reload of ecx,
and merging of the higher constant with the other MSR accesses.

I'm not entirely sure the alternative is justified here.  For
consistency alone, these helpers should match their companions, and in
the unlikely case that the runtime feature test does make a measurable
difference, wouldn't a static key be a better option anyway?

> +
> +return base;
> +}
> +
>  static inline void wrfsbase(unsigned long base)
>  {
>  if ( cpu_has_fsgsbase )
> @@ -196,6 +215,19 @@ static inline void wrgsbase(unsigned lon
>  wrmsrl(MSR_GS_BASE, base);
>  }
>  
> +static inline void wrgsshadow(unsigned long base)
> +{
> +alternative_input("mov %[msr], %%ecx\n\t"
> +  "shld $32, %[val], %%rdx\n\t"

This is a vector path instruction and specifically called out to be
avoided in the AMD optimisation guide.  On all hardware (according to
Agner's latency mesurements) it alone has a longer latency to execute
that the mov/shl pair you've replaced, and that is before accounting for
mov elimination.

~Andrew

> +  "wrmsr",
> +  "swapgs\n\t"
> +  ".byte 0xf3, 0x48, 0x0f, 0xae, 0xd8\n\t" /* wrgsbase 
> rax */
> +  "swapgs",
> +  X86_FEATURE_FSGSBASE,
> +  [val] "a" (base),
> +  [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx");
> +}
> +
>  DECLARE_PER_CPU(u64, efer);
>  u64 read_efer(void);
>  void write_efer(u64 val);
>
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86: rename DIRTY_GS_BASE_USER

2017-12-06 Thread Andrew Cooper

On 06/12/17 16:38, Jan Beulich wrote:
> As of commit 91f85280b9 ("x86: fix GS-base-dirty determination") the
> USER part of it isn't really appropriate anymore.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] x86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()

2017-12-06 Thread Andrew Cooper

Since c/s 49de10f3c1718 "x86/hvm: Don't raise #GP behind the emulators back
for MSR accesses", returnning X86EMUL_EXCEPTION has pushed the exception
generation to the top of the call tree.

Using hvm_inject_hw_exception() and returning X86EMUL_EXCEPTION causes a
double #GP injection, which combines to #DF.

While fixing this up, rename uncanonical_address to the more common gp_fault,
and drop the HVM_DBG_LOG() line which is redundant given the two adjacent
lines.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Jun Nakajima 
CC: Kevin Tian 

This wants backporting to 4.9
---
 xen/arch/x86/hvm/vmx/vmx.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b18ccea..426902b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -542,7 +542,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 case MSR_GS_BASE:
 case MSR_SHADOW_GS_BASE:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+goto gp_fault;
 
 if ( msr == MSR_FS_BASE )
 __vmwrite(GUEST_FS_BASE, msr_content);
@@ -560,14 +560,14 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 
 case MSR_LSTAR:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+goto gp_fault;
 v->arch.hvm_vmx.lstar = msr_content;
 wrmsrl(MSR_LSTAR, msr_content);
 break;
 
 case MSR_CSTAR:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+goto gp_fault;
 v->arch.hvm_vmx.cstar = msr_content;
 break;
 
@@ -582,9 +582,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 
 return HNDL_done;
 
- uncanonical_address:
-HVM_DBG_LOG(DBG_LEVEL_MSR, "Not cano address of msr write %x", msr);
-hvm_inject_hw_exception(TRAP_gp_fault, 0);
+ gp_fault:
 return HNDL_exception_raised;
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] x86/vmx: Drop enum handler_return

2017-12-06 Thread Andrew Cooper

They are straight aliases of the more common X86EMUL_* constants.  While
adjusting these, fix the case indentation where appropriate.

No functional change, confirmed by diff'ing the compiled binary.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Jun Nakajima 
CC: Kevin Tian 
---
 xen/arch/x86/hvm/vmx/vmx.c | 66 ++
 1 file changed, 31 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 426902b..ea98a4e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -62,8 +62,6 @@
 static bool_t __initdata opt_force_ept;
 boolean_param("force-ept", opt_force_ept);
 
-enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
-
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
 
@@ -485,8 +483,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
 passive_domain_destroy(v);
 }
 
-static enum handler_return
-long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
+static int long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
 {
 struct vcpu *v = current;
 
@@ -521,16 +518,15 @@ long_mode_do_msr_read(unsigned int msr, uint64_t 
*msr_content)
 break;
 
 default:
-return HNDL_unhandled;
+return X86EMUL_UNHANDLEABLE;
 }
 
 HVM_DBG_LOG(DBG_LEVEL_MSR, "msr %#x content %#"PRIx64, msr, *msr_content);
 
-return HNDL_done;
+return X86EMUL_OKAY;
 }
 
-static enum handler_return
-long_mode_do_msr_write(unsigned int msr, uint64_t msr_content)
+static int long_mode_do_msr_write(unsigned int msr, uint64_t msr_content)
 {
 struct vcpu *v = current;
 
@@ -577,13 +573,13 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 break;
 
 default:
-return HNDL_unhandled;
+return X86EMUL_UNHANDLEABLE;
 }
 
-return HNDL_done;
+return X86EMUL_OKAY;
 
  gp_fault:
-return HNDL_exception_raised;
+return X86EMUL_EXCEPTION;
 }
 
 /*
@@ -2937,12 +2933,12 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 goto done;
 switch ( long_mode_do_msr_read(msr, msr_content) )
 {
-case HNDL_unhandled:
-break;
-case HNDL_exception_raised:
-return X86EMUL_EXCEPTION;
-case HNDL_done:
-goto done;
+case X86EMUL_UNHANDLEABLE:
+break;
+case X86EMUL_EXCEPTION:
+return X86EMUL_EXCEPTION;
+case X86EMUL_OKAY:
+goto done;
 }
 
 if ( vmx_read_guest_msr(msr, msr_content) == 0 )
@@ -3161,24 +3157,24 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 
 switch ( long_mode_do_msr_write(msr, msr_content) )
 {
-case HNDL_unhandled:
-if ( (vmx_write_guest_msr(msr, msr_content) != 0) &&
- !is_last_branch_msr(msr) )
-switch ( wrmsr_hypervisor_regs(msr, msr_content) )
-{
-case -ERESTART:
-return X86EMUL_RETRY;
-case 0:
-case 1:
-break;
-default:
-goto gp_fault;
-}
-break;
-case HNDL_exception_raised:
-return X86EMUL_EXCEPTION;
-case HNDL_done:
-break;
+case X86EMUL_UNHANDLEABLE:
+if ( (vmx_write_guest_msr(msr, msr_content) != 0) &&
+ !is_last_branch_msr(msr) )
+switch ( wrmsr_hypervisor_regs(msr, msr_content) )
+{
+case -ERESTART:
+return X86EMUL_RETRY;
+case 0:
+case 1:
+break;
+default:
+goto gp_fault;
+}
+break;
+case X86EMUL_EXCEPTION:
+return X86EMUL_EXCEPTION;
+case X86EMUL_OKAY:
+break;
 }
 break;
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 01/25] x86emul: make decode_register() return unsigned long *

2017-12-07 Thread Andrew Cooper

On 07/12/17 13:58, Jan Beulich wrote:
> Quite a few casts can be dropped this way, and type-safeness is being
> increased by not using void * (same goes for decode_vex_gpr()). Drop
> casts and no longer needed intermediate variables where possible. Take
> the opportunity and also switch the last parameter to bool.
>
> Signed-off-by: Jan Beulich 

This will need rebasing over 053ae230b1 but that only adjusts the
parameters type of index so shouldn't cause further problems.

However, is this wise?  I can certainly see the attraction for not
needing to casing away from void *, you now give the impression that it
is safe to deference the returned pointer as an unsigned long, even in
the cases where it isn't safe.

At least with returning void*, the required cast highlights that
something special is going on.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 02/25] x86emul: build SIMD tests with -Os

2017-12-07 Thread Andrew Cooper

On 07/12/17 13:59, Jan Beulich wrote:
> Specifically in the context of putting together subsequent patches I've
> noticed that together with the touch() macro using -Os further
> increases the chances of the compiler using memory operands for the
> instructions we actually care to test.
>
> Signed-off-by: Jan Beulich 
> Reviewed-by: George Dunlap 
>

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages

2017-12-11 Thread Andrew Cooper

On 11/12/17 09:14, Jan Beulich wrote:
 On 08.12.17 at 13:42,  wrote:
>> On 12/08/2017 02:18 PM, Jan Beulich wrote:
>> On 24.10.17 at 12:19,  wrote:
 HVMOP_altp2m_set_mem_access_multi has been added as a HVMOP (as opposed to 
 a
 DOMCTL) for consistency with its HVMOP_altp2m_set_mem_access counterpart 
 (and
 hence with the original altp2m design, where domains are allowed - with the
 proper altp2m access rights - to alter these settings), in the absence of 
 an
 official position on the issue from the original altp2m designers.
>>> I continue to disagree with this reasoning. I'm afraid I'm not really
>>> willing to allow widening the badness, unless altp2m was formally
>>> documented security-unsupported.
>> Going the DOMCTL route here would have been the (much easier) solution,
>> and in fact, as stated before, there has been an attempt to do so -
>> however, IIRC Andrew has insisted that we should take care to use
>> consistent access privilege across altp2m operations.
> Andrew, is that the case (I don't recall anything like that)?

My suggestion was that we don't break usecases.  The Intel usecase
specifically is for an in-guest entity to have full control of all
altp2m functionality, and this is fine (security wise) when permitted to
do so by the toolstack.

~Andrew

>
>> This was followed by a lengthy xen-devel discussion and several
>> unsuccessful attempts to obtain an official position from the original
>> contributors, at which point (after several months), as also discussed
>> at the Xen Developer Summit in Budapest, we decided to press on in the
>> direction that had seemed the most compatible with the original altp2m
>> design. (Please correct me if I'm misremembering or misunderstanding
>> something.)
>>
>> So at this point it looks like we're stuck again: we're happy to go in
>> any direction the maintainers decide is the best, but we do need to
>> decide on one.
>>
>> FWIW, Tamas (CC added) has added code to restrict where altp2m calls can
>> come from (although that's not XSM code).
>>
>> Please let us know how to proceed.
> I've given my suggestion already: Now that we have SUPPORT.md,
> submit a patch to add altp2m there (not sure if it was in the part of
> George's series that was left out for the moment), stating it's
> security unsupported. With that's I still wouldn't like the addition by
> this patch, but I also wouldn't object to this widening of an already
> bad situation anymore: Anyone wanting to alter that support status
> would first need to deal with the too wide exposure of some of the
> operations.
>
> Jan
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC] xen/arm: Handling cache maintenance instructions by set/way

2017-12-11 Thread Andrew Cooper

On 11/12/17 10:06, Jan Beulich wrote:
 On 08.12.17 at 15:38,  wrote:
>> On 08/12/17 08:03, Tim Deegan wrote:
>>> It should be possible to do something like the misconfigured-entry bit
>>> trick by _allocating_ the memory up-front and building the p2m entries
>>> but only making them usable by the {IO}MMUs on first access.  That
>>> would make these early p2m walks shorter (because they can skip whole
>>> subtrees that aren't marked present yet) without making major changes
>>> to domain build or introducing run-time failures.
>> I am not aware of any way on Arm to misconfigure an entry. We do have 
>> valid and access bits, although they will affect the IOMMU as well. So 
>> it will not be possible to get page-table sharing with this "feature" 
>> enabled.
> How would you intend to solve the IOMMU part of the problem with
> PoD? As was pointed out before - IOMMU and PoD are incompatible
> on x86.

Not only that.

The use of an IOMMU is incompatible with any HAP scheme using EPT/NPT
violations to trigger hypervisor work, and will remain the case until
such time as IOMMUs gain restartable pagefaults.  The chances of this
happening are tantamount to zero, due to timing requirements in the
PCI(e) spec.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/E820: don't overrun array

2017-12-12 Thread Andrew Cooper

On 12/12/17 11:10, Jan Beulich wrote:
> The bounds check needs to be done after the increment, not before, or
> else it needs to use a one lower immediate. Also use word operations
> rather than byte ones for both the increment and the compare (allowing
> E820_BIOS_MAX to be more easily bumped, should the need ever arise).
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/boot/mem.S
> +++ b/xen/arch/x86/boot/mem.S
> @@ -22,11 +22,10 @@ get_memory_map:
>  cmpl$SMAP,%eax  # check the return is `SMAP'
>  jne .Lmem88
>  
> -movbbootsym(e820nr),%al # up to 128 entries
> -cmpb$E820_BIOS_MAX,%al
> +incwbootsym(e820nr)
> +cmpw$E820_BIOS_MAX,bootsym(e820nr)  # up to this many entries

Space after the comma here please.

Given your subsequent instruction scheduling patch, why the word
operations? 32bit operations are faster than 16bit.

As e820nr is already a 32bit value, I'd just move them straight to
incl/cmpl.

~Andrew

>  jae .Lmem88
>  
> -incbbootsym(e820nr)
>  movw%di,%ax
>  addw$20,%ax
>  movw%ax,%di
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/E820: improve insn selection

2017-12-12 Thread Andrew Cooper

On 12/12/17 11:10, Jan Beulich wrote:
> ..., largely to shrink code size a little:
> - use TEST instead of CMP with zero immediate
> - use MOVZWL instead of AND with 0x immediate
> - compute final highmem_bk value in registers, accessing memory just
>   once
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper , altbeit it
preferably with space in the first hunk.

Any chance we can drop redundant size suffixes as we go?

>
> --- a/xen/arch/x86/boot/mem.S
> +++ b/xen/arch/x86/boot/mem.S
> @@ -29,8 +29,8 @@ get_memory_map:
>  movw%di,%ax
>  addw$20,%ax
>  movw%ax,%di
> -cmpl$0,%ebx # check to see if
> -jne 1b  # %ebx is set to EOF
> +testl   %ebx,%ebx   # check to see if
> +jnz 1b  # %ebx is set to EOF
>  
>  .Lmem88:
>  movb$0x88, %ah
> @@ -48,17 +48,17 @@ get_memory_map:
>  int $0x15
>  jc  .Lint12
>  
> -cmpw$0x0, %cx   # Kludge to handle BIOSes
> -jne 1f  # which report their extended
> -cmpw$0x0, %dx   # memory in AX/BX rather than
> -jne 1f  # CX/DX.  The spec I have 
> read
> +testw   %cx, %cx# Kludge to handle BIOSes
> +jnz 1f  # which report their extended
> +testw   %dx, %dx# memory in AX/BX rather than
> +jnz 1f  # CX/DX.  The spec I have 
> read
>  movw%ax, %cx# seems to indicate AX/BX 
>  movw%bx, %dx# are more reasonable 
> anyway...
> -1:  andl$0x,%edx# clear sign extend
> +1:  movzwl  %dx, %edx
>  shll$6,%edx # and go from 64k to 1k 
> chunks
> +movzwl  %cx, %ecx
> +addl%ecx, %edx  # add in lower memory
>  movl%edx,bootsym(highmem_kb)# store extended memory size
> -andl$0x,%ecx# clear sign extend
> -addl%ecx,bootsym(highmem_kb)# and add lower memory into
>  
>  .Lint12:
>  int $0x12
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v14 07/11] x86/mm: add an extra command to HYPERVISOR_mmu_update...

2017-12-12 Thread Andrew Cooper

On 12/12/17 13:25, Jan Beulich wrote:
 On 28.11.17 at 16:08,  wrote:
>> @@ -1905,7 +1906,8 @@ static int mod_l1_entry(l1_pgentry_t *pl1e, 
>> l1_pgentry_t nl1e,
>>  }
>>  
>>  /* Translate foreign guest address. */
>> -if ( paging_mode_translate(pg_dom) )
>> +if ( cmd != MMU_PT_UPDATE_NO_TRANSLATE &&
>> + paging_mode_translate(pg_dom) )
>>  {
>>  p2m_type_t p2mt;
>>  p2m_query_t q = l1e_get_flags(nl1e) & _PAGE_RW ?
> Now that they're public - it was this change which led to the
> recognition of the issue XSA-248 describes (which in turn led to the
> other three). Without the fix for XSA-248 you'd have introduced a
> worse issue here, allowing writable mappings of page table pages
> rather than just r/o ones (leading to hypervisor crashes).
>
> Especially with the bypass of acquiring a writable page ref in
> get_page_from_l1e() for domains controlling shadow-external
> domains we need to be extremely careful with assigning page
> ownership. Before this series goes in I'd therefor like to ask you and
> others (especially people on the Cc list) to double check that the
> bypass introduced above doesn't allow for other (security) badness.
> I think I've sufficiently convinced myself that it doesn't, but this
> clearly wants double checking.

Perhaps it is worth stepping back and considering the usecase from first
principles.

We are deliberately trying to introducing a mechanism whereby a
toolstack/device-mode/other semi-privileged entity can map resources
belonging to a guest which are not part of the guests physmap.  This is
because we deliberately want to move things like emulator rings out of
the guest physmap for attack surface reduction purposes.

On top of that, it would be far more simple if the mechanism by which
this is achieved was compatible with the existing mapping interfaces. 
One way or another, a PV guest needs to be able to construct a PTE for
these frames, and HVM guests need to be able to add these frames to its
physmap, and this looks very similar to foreign mapping.

Other thoughts/suggestions welcome.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH] x86/mm: drop bogus assertion

2017-12-12 Thread Andrew Cooper

On 08/12/17 08:19, Tim Deegan wrote:
> Hi,
>
> At 02:31 -0700 on 06 Dec (1512527481), Jan Beulich wrote:
>>>>> On 30.11.17 at 10:10,  wrote:
>>> i.e. the guest specified a runstate area address which has a non-present
>>> mapping in the page tables [EC=0002 CR2=88003d405220], but that's
>>> not something the hypervisor needs to be concerned about.) Release
>>> builds work fine, which is a first indication that the assertion isn't
>>> really needed.
> Yep, this assertion should just go away, so:
>
> Reviewed-by: Tim Deegan 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/efer: Make {read, write}_efer() into inline helpers

2017-12-13 Thread Andrew Cooper

There is no need for the overhead of a call to a separate translation unit.
While moving the implementation, update them to use uint64_t over u64

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/traps.c  | 13 +
 xen/include/asm-x86/msr.h | 14 +++---
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8a80cd9..db16a44 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -93,7 +93,7 @@ static char __read_mostly opt_nmi[10] = "fatal";
 #endif
 string_param("nmi", opt_nmi);
 
-DEFINE_PER_CPU(u64, efer);
+DEFINE_PER_CPU(uint64_t, efer);
 static DEFINE_PER_CPU(unsigned long, last_extable_addr);
 
 DEFINE_PER_CPU_READ_MOSTLY(u32, ler_msr);
@@ -1718,17 +1718,6 @@ void do_device_not_available(struct cpu_user_regs *regs)
 return;
 }
 
-u64 read_efer(void)
-{
-return this_cpu(efer);
-}
-
-void write_efer(u64 val)
-{
-this_cpu(efer) = val;
-wrmsrl(MSR_EFER, val);
-}
-
 static void ler_enable(void)
 {
 u64 debugctl;
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 41732a4..2fbed02 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -196,9 +196,17 @@ static inline void wrgsbase(unsigned long base)
 wrmsrl(MSR_GS_BASE, base);
 }
 
-DECLARE_PER_CPU(u64, efer);
-u64 read_efer(void);
-void write_efer(u64 val);
+DECLARE_PER_CPU(uint64_t, efer);
+static inline uint64_t read_efer(void)
+{
+return this_cpu(efer);
+}
+
+static inline void write_efer(uint64_t val)
+{
+this_cpu(efer) = val;
+wrmsrl(MSR_EFER, val);
+}
 
 DECLARE_PER_CPU(u32, ler_msr);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/2] x86/vmx: Drop enum handler_return

2017-12-13 Thread Andrew Cooper

They are straight aliases of the more common X86EMUL_* constants.  While
adjusting these, fix the case indentation where appropriate.

No functional change, confirmed by diff'ing the compiled binary.

Signed-off-by: Andrew Cooper 
Acked-by: Kevin Tian 
---
CC: Jan Beulich 

v2:
 * Rebase over changes in patch 1
 * Drop redundant case labels
---
 xen/arch/x86/hvm/vmx/vmx.c | 68 +-
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 73254bf..ed56a04 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -62,8 +62,6 @@
 static bool_t __initdata opt_force_ept;
 boolean_param("force-ept", opt_force_ept);
 
-enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
-
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
 
@@ -485,8 +483,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
 passive_domain_destroy(v);
 }
 
-static enum handler_return
-long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
+static int long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
 {
 struct vcpu *v = current;
 
@@ -521,16 +518,15 @@ long_mode_do_msr_read(unsigned int msr, uint64_t 
*msr_content)
 break;
 
 default:
-return HNDL_unhandled;
+return X86EMUL_UNHANDLEABLE;
 }
 
 HVM_DBG_LOG(DBG_LEVEL_MSR, "msr %#x content %#"PRIx64, msr, *msr_content);
 
-return HNDL_done;
+return X86EMUL_OKAY;
 }
 
-static enum handler_return
-long_mode_do_msr_write(unsigned int msr, uint64_t msr_content)
+static int long_mode_do_msr_write(unsigned int msr, uint64_t msr_content)
 {
 struct vcpu *v = current;
 
@@ -542,7 +538,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 case MSR_GS_BASE:
 case MSR_SHADOW_GS_BASE:
 if ( !is_canonical_address(msr_content) )
-return HNDL_exception_raised;
+return X86EMUL_EXCEPTION;
 
 if ( msr == MSR_FS_BASE )
 __vmwrite(GUEST_FS_BASE, msr_content);
@@ -560,14 +556,14 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 
 case MSR_LSTAR:
 if ( !is_canonical_address(msr_content) )
-return HNDL_exception_raised;
+return X86EMUL_EXCEPTION;
 v->arch.hvm_vmx.lstar = msr_content;
 wrmsrl(MSR_LSTAR, msr_content);
 break;
 
 case MSR_CSTAR:
 if ( !is_canonical_address(msr_content) )
-return HNDL_exception_raised;
+return X86EMUL_EXCEPTION;
 v->arch.hvm_vmx.cstar = msr_content;
 break;
 
@@ -577,10 +573,10 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 break;
 
 default:
-return HNDL_unhandled;
+return X86EMUL_UNHANDLEABLE;
 }
 
-return HNDL_done;
+return X86EMUL_OKAY;
 }
 
 /*
@@ -2934,12 +2930,11 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 goto done;
 switch ( long_mode_do_msr_read(msr, msr_content) )
 {
-case HNDL_unhandled:
-break;
-case HNDL_exception_raised:
-return X86EMUL_EXCEPTION;
-case HNDL_done:
-goto done;
+case X86EMUL_EXCEPTION:
+return X86EMUL_EXCEPTION;
+
+case X86EMUL_OKAY:
+goto done;
 }
 
 if ( vmx_read_guest_msr(msr, msr_content) == 0 )
@@ -3158,24 +3153,23 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 
 switch ( long_mode_do_msr_write(msr, msr_content) )
 {
-case HNDL_unhandled:
-if ( (vmx_write_guest_msr(msr, msr_content) != 0) &&
- !is_last_branch_msr(msr) )
-switch ( wrmsr_hypervisor_regs(msr, msr_content) )
-{
-case -ERESTART:
-return X86EMUL_RETRY;
-case 0:
-case 1:
-break;
-default:
-goto gp_fault;
-}
-break;
-case HNDL_exception_raised:
-return X86EMUL_EXCEPTION;
-case HNDL_done:
-break;
+case X86EMUL_UNHANDLEABLE:
+if ( (vmx_write_guest_msr(msr, msr_content) != 0) &&
+ !is_last_branch_msr(msr) )
+switch ( wrmsr_hypervisor_regs(msr, msr_content) )
+{
+case -ERESTART:
+return X86EMUL_RETRY;
+case 0:
+case 1:
+break;
+default:
+goto gp_fault;
+}
+break;
+
+case X86EMUL_EXCEPTION:
+return X86EMUL_EXCEPTI

[Xen-devel] [PATCH v2 1/2] x86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()

2017-12-13 Thread Andrew Cooper

Since c/s 49de10f3c1718 "x86/hvm: Don't raise #GP behind the emulators back
for MSR accesses", returning X86EMUL_EXCEPTION has pushed the exception
generation to the top of the call tree.

Using hvm_inject_hw_exception() and returning X86EMUL_EXCEPTION causes a
double #GP injection, which combines to #DF.

Signed-off-by: Andrew Cooper 
Acked-by: Kevin Tian 
---
CC: Jan Beulich 

v2:
 * Drop uncanonical_address and return HNDL_exception_raised directly

This wants backporting to 4.9
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b18ccea..73254bf 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -542,7 +542,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 case MSR_GS_BASE:
 case MSR_SHADOW_GS_BASE:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+return HNDL_exception_raised;
 
 if ( msr == MSR_FS_BASE )
 __vmwrite(GUEST_FS_BASE, msr_content);
@@ -560,14 +560,14 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 
 case MSR_LSTAR:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+return HNDL_exception_raised;
 v->arch.hvm_vmx.lstar = msr_content;
 wrmsrl(MSR_LSTAR, msr_content);
 break;
 
 case MSR_CSTAR:
 if ( !is_canonical_address(msr_content) )
-goto uncanonical_address;
+return HNDL_exception_raised;
 v->arch.hvm_vmx.cstar = msr_content;
 break;
 
@@ -581,11 +581,6 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
 }
 
 return HNDL_done;
-
- uncanonical_address:
-HVM_DBG_LOG(DBG_LEVEL_MSR, "Not cano address of msr write %x", msr);
-hvm_inject_hw_exception(TRAP_gp_fault, 0);
-return HNDL_exception_raised;
 }
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/domctl: Avoid redundant zeroing in XEN_DOMCTL_get_vcpu_msrs

2017-12-13 Thread Andrew Cooper

Zero the msr structure once at initialisation time, and avoid re-zeroing the
reserved field every time the structure is used.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/domctl.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 075ee92..036fdcd 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1284,7 +1284,7 @@ long arch_do_domctl(
 case XEN_DOMCTL_set_vcpu_msrs:
 {
 struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
-struct xen_domctl_vcpu_msr msr;
+struct xen_domctl_vcpu_msr msr = {};
 struct vcpu *v;
 static const uint32_t msrs_to_send[] = {
 MSR_INTEL_MISC_FEATURES_ENABLES,
@@ -1347,7 +1347,6 @@ long arch_do_domctl(
 if ( i < vmsrs->msr_count && !ret )
 {
 msr.index = msrs_to_send[j];
-msr.reserved = 0;
 msr.value = val;
 if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
 ret = -EFAULT;
@@ -1362,7 +1361,6 @@ long arch_do_domctl(
 if ( i < vmsrs->msr_count && !ret )
 {
 msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
-msr.reserved = 0;
 msr.value = v->arch.pv_vcpu.dr_mask[0];
 if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) 
)
 ret = -EFAULT;
@@ -1377,7 +1375,6 @@ long arch_do_domctl(
 if ( i < vmsrs->msr_count && !ret )
 {
 msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
-msr.reserved = 0;
 msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
 if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) 
)
 ret = -EFAULT;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs/process/xen-release-management: Lesson to learn

2017-12-13 Thread Andrew Cooper

On 13/12/17 12:02, Ian Jackson wrote:
> The 4.10 release preparation was significantly more hairy than ideal.
> (We seem to have a good overall outcome despite, rather than because
> of, our approach.)
>
> This is the second time (at least) that we have come close to failure
> by committing to a release date before the exact code to be released
> is known and has been made and tested.
>
> Evidently our docs makes it insufficiently clear not to do that.
>
> CC: Lars Kurth 
> CC: Julien Grall 
> CC: Juergen Gross 
> Signed-off-by: Ian Jackson 
> ---
>  docs/process/xen-release-management.pandoc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/docs/process/xen-release-management.pandoc 
> b/docs/process/xen-release-management.pandoc
> index 2ff0665..eee5dcf 100644
> --- a/docs/process/xen-release-management.pandoc
> +++ b/docs/process/xen-release-management.pandoc
> @@ -211,6 +211,11 @@ https://wiki.xenproject.org/wiki/Category:Xen_4.9
>  Ask them to dry-run their checklist and confirm everything is OK. If not,
>  arrange another RC and restart this checklist.
>  
> +7. Do not commit to a release date until
> +
> +* The exact xen.git commit id to be released is known.
> +* That commit id has been satisfactorily tested.
> +
>  7. Give PR Personnel final go-ahead, and instruct Release Technician to make

s/7/8/

~Andrew

>  release deliverables (tags and tarballs - will usually be in place the day
>  before the release). At this point, PR collateral will be sent to reporters


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/microcode: Add support for fam17h microcode loading

2017-12-13 Thread Andrew Cooper

From: Tom Lendacky 

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Signed-off-by: Ingo Molnar 
[Linux commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf]

Ported to Xen.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/microcode_amd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index b54b0b9..53f9f54 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -107,6 +107,7 @@ static bool_t verify_patch_size(uint32_t patch_size)
 #define F14H_MPB_MAX_SIZE 1824
 #define F15H_MPB_MAX_SIZE 4096
 #define F16H_MPB_MAX_SIZE 3458
+#define F17H_MPB_MAX_SIZE 3200
 
 switch (boot_cpu_data.x86)
 {
@@ -119,6 +120,9 @@ static bool_t verify_patch_size(uint32_t patch_size)
 case 0x16:
 max_size = F16H_MPB_MAX_SIZE;
 break;
+case 0x17:
+max_size = F17H_MPB_MAX_SIZE;
+break;
 default:
 max_size = F1XH_MPB_MAX_SIZE;
 break;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/efi: Fix build with clang-5.0

2017-12-13 Thread Andrew Cooper

The clang-5.0 build is reliably failing with:

  Error: size of boot.o:.text is 0x01

which is because efi_arch_flush_dcache_area() exists as a single ret
instruction.  Mark it as __init like everything else in the files.

Spotted by Travis.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 

For future discussion, Why are these implementations not inline?  Why are we
being special and allowing a header file and define functions and globals like
this?
---
 xen/arch/arm/efi/efi-boot.h | 2 +-
 xen/arch/x86/efi/efi-boot.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index 56de26e..ca655ff 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -597,7 +597,7 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 {
 }
 
-static void efi_arch_flush_dcache_area(const void *vaddr, UINTN size)
+static void __init efi_arch_flush_dcache_area(const void *vaddr, UINTN size)
 {
 __flush_dcache_area(vaddr, size);
 }
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 8d295ff..d30f688 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -668,7 +668,7 @@ static bool __init 
efi_arch_use_config_file(EFI_SYSTEM_TABLE *SystemTable)
 return true; /* x86 always uses a config file */
 }
 
-static void efi_arch_flush_dcache_area(const void *vaddr, UINTN size) { }
+static void __init efi_arch_flush_dcache_area(const void *vaddr, UINTN size) { 
}
 
 void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RESEND/PING] timers: limit heap size

2019-09-02 Thread Andrew Cooper

On 30/08/2019 14:33, Jan Beulich wrote:
> First and foremost make timer_softirq_action() avoid growing the heap
> if its new size can't be stored without truncation. 64k entries is a
> lot, and I don't think we're at risk of actually running into the issue,
> but I also think it's better not to allow for hard to debug problems to
> occur in the first place.
>
> Furthermore also adjust the code such the size/limit fields becoming
> unsigned int would at least work from a mere sizing point of view. For
> this also switch various uses of plain int to unsigned int.
>
> Signed-off-by: Jan Beulich 
> Reviewed-by: Roger Pau MonnÃ© 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RESEND/PING] core-parking: interact with runtime SMT-disabling

2019-09-02 Thread Andrew Cooper

On 30/08/2019 14:33, Jan Beulich wrote:
> When disabling SMT at runtime, secondary threads should no longer be
> candidates for bringing back up in response to _PUD ACPI events. Purge
> them from the tracking array.

I think I agree in principle, but what are _PUD events?  I can't find
any reference to them at all in the ACPI spec.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] x86/ACPI: restore VESA mode upon resume from S3

2019-09-02 Thread Andrew Cooper

On 30/08/2019 14:41, Jan Beulich wrote:
> In order for "acpi_sleep=s3_mode" to have any effect, we should record
> the video mode we switched to during boot. Since right now there's mode
> setting code for VESA modes only in the resume case, record the mode
> just in that one case.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

> ---
> I'm wondering actually whether the user having to explicitly request the
> mode restoration is a good model: Why would we _not_ want to restore the
> mode we've set during boot? In the worst case Dom0 kernel or X will
> change the mode another time.

By this, I presume you mean drop the acpi_sleep option entirely?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86: a little bit of 16-bit video mode setting code cleanup

2019-09-02 Thread Andrew Cooper

On 30/08/2019 14:41, Jan Beulich wrote:
> To "compensate" for the code size growth by an earlier change:
> - drop "trampoline" labels (in almost all cases the target label is
>   reachable with an 8-bit-displacement branch anyway, and a single 16-
>   bit-displacement branch is still better than a pair of two branches)
> - drop an entirely dead insn from wakeup.S:mode_setw
> - reduce code size in a few other (obvious I hope) cases, by more
>   suitable insn/operands selection
>
> Also drop redundant #define-s (move suitable #include a little earlier
> instead) and add two alignment directives.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] x86: shrink video_{flags, mode} to {8, 16} bits

2019-09-02 Thread Andrew Cooper

On 30/08/2019 14:42, Jan Beulich wrote:
> We really don't need them to be any wider.
>
> Also remove the C level declaration (and hence also the GLOBAL) of
> video_mode altogether; it's used in assembly code only.
>
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] x86/acpi: Drop sleep_states[] and associated print messages

2019-09-02 Thread Andrew Cooper

sleep_states[] is a write-only array, and despite the loop logic, the printed
message is consistently "ACPI sleep modes: S3".  Drop it all.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/acpi/power.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index d83e8cdd52..6ae9e29229 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -36,7 +36,6 @@ uint32_t system_reset_counter = 1;
 static char __initdata opt_acpi_sleep[20];
 string_param("acpi_sleep", opt_acpi_sleep);
 
-static u8 sleep_states[ACPI_S_STATE_COUNT];
 static DEFINE_SPINLOCK(pm_lock);
 
 struct acpi_sleep_info acpi_sinfo;
@@ -460,7 +459,6 @@ acpi_status acpi_enter_sleep_state(u8 sleep_state)
 
 static int __init acpi_sleep_init(void)
 {
-int i;
 char *p = opt_acpi_sleep;
 
 while ( (p != NULL) && (*p != '\0') )
@@ -474,19 +472,6 @@ static int __init acpi_sleep_init(void)
 p += strspn(p, ", \t");
 }
 
-printk(XENLOG_INFO "ACPI sleep modes:");
-for ( i = 0; i < ACPI_S_STATE_COUNT; i++ )
-{
-if ( i == ACPI_STATE_S3 )
-{
-sleep_states[i] = 1;
-printk(" S%d", i);
-}
-else
-sleep_states[i] = 0;
-}
-printk("\n");
-
 return 0;
 }
 __initcall(acpi_sleep_init);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] x86/apci: Adjust command line parsing for "acpi_sleep"

2019-09-02 Thread Andrew Cooper

Perform parsing in a custom_param, rather than stashing the content in a
string and parsing in an initcall.  Adjust the parsing to conform to current
standards.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

The reason that flags is pulled into a local variable is that the codegen for
acpi_video_flags is attrocious, 260 bytes!, and doubles up when used twice.
---
 xen/arch/x86/acpi/power.c | 47 ++-
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 6ae9e29229..1ce5de221e 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -33,8 +33,32 @@
 
 uint32_t system_reset_counter = 1;
 
-static char __initdata opt_acpi_sleep[20];
-string_param("acpi_sleep", opt_acpi_sleep);
+static int parse_acpi_sleep(const char *s)
+{
+const char *ss;
+unsigned int flag = 0;
+int rc = 0;
+
+do {
+ss = strchr(s, ',');
+if ( !ss )
+ss = strchr(s, '\0');
+
+if ( !cmdline_strcmp(s, "s3_bios") )
+flag |= 1;
+else if ( !cmdline_strcmp(s, "s3_mode") )
+flag |= 2;
+else
+rc = -EINVAL;
+
+s = ss + 1;
+} while ( *ss );
+
+acpi_video_flags = flag;
+
+return 0;
+}
+custom_runtime_param("acpi_sleep", parse_acpi_sleep);
 
 static DEFINE_SPINLOCK(pm_lock);
 
@@ -456,22 +480,3 @@ acpi_status acpi_enter_sleep_state(u8 sleep_state)
 
 return_ACPI_STATUS(AE_OK);
 }
-
-static int __init acpi_sleep_init(void)
-{
-char *p = opt_acpi_sleep;
-
-while ( (p != NULL) && (*p != '\0') )
-{
-if ( !strncmp(p, "s3_bios", 7) )
-acpi_video_flags |= 1;
-if ( !strncmp(p, "s3_mode", 7) )
-acpi_video_flags |= 2;
-p = strchr(p, ',');
-if ( p != NULL )
-p += strspn(p, ", \t");
-}
-
-return 0;
-}
-__initcall(acpi_sleep_init);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] x86/apci: Adjust command line parsing for "acpi_sleep"

2019-09-02 Thread Andrew Cooper

Perform parsing in a custom_param, rather than stashing the content in a
string and parsing in an initcall.  Adjust the parsing to conform to current
standards.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

The reason that flags is pulled into a local variable is that the codegen for
acpi_video_flags is attrocious, 260 bytes!, and doubles up when used twice.
---
 xen/arch/x86/acpi/power.c | 47 ++-
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 6ae9e29229..414bda205d 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -33,8 +33,32 @@
 
 uint32_t system_reset_counter = 1;
 
-static char __initdata opt_acpi_sleep[20];
-string_param("acpi_sleep", opt_acpi_sleep);
+static int __init parse_acpi_sleep(const char *s)
+{
+const char *ss;
+unsigned int flag = 0;
+int rc = 0;
+
+do {
+ss = strchr(s, ',');
+if ( !ss )
+ss = strchr(s, '\0');
+
+if ( !cmdline_strcmp(s, "s3_bios") )
+flag |= 1;
+else if ( !cmdline_strcmp(s, "s3_mode") )
+flag |= 2;
+else
+rc = -EINVAL;
+
+s = ss + 1;
+} while ( *ss );
+
+acpi_video_flags = flag;
+
+return 0;
+}
+custom_param("acpi_sleep", parse_acpi_sleep);
 
 static DEFINE_SPINLOCK(pm_lock);
 
@@ -456,22 +480,3 @@ acpi_status acpi_enter_sleep_state(u8 sleep_state)
 
 return_ACPI_STATUS(AE_OK);
 }
-
-static int __init acpi_sleep_init(void)
-{
-char *p = opt_acpi_sleep;
-
-while ( (p != NULL) && (*p != '\0') )
-{
-if ( !strncmp(p, "s3_bios", 7) )
-acpi_video_flags |= 1;
-if ( !strncmp(p, "s3_mode", 7) )
-acpi_video_flags |= 2;
-p = strchr(p, ',');
-if ( p != NULL )
-p += strspn(p, ", \t");
-}
-
-return 0;
-}
-__initcall(acpi_sleep_init);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware

2019-09-02 Thread Andrew Cooper

On 29/08/2019 13:56, Jan Beulich wrote:
> On 19.08.2019 20:26, Andrew Cooper wrote:
>> AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
>> FOP/FIP/FDP if an x87 exception isn't pending.  This causes an information
>> leak, CVE-2006-1056, and worked around by several OSes, including Xen.  AMD
>> Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.
>>
>> Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to 
>> all
>> guests by default.  While adjusting libxl's cpuid table, add CLZERO which
>> looks to have been omitted previously.
>>
>> Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
>> on AMD hardware where RSTR_FP_ERR_PTRS is not advertised.  Optimise the
>> workaround path by dropping the data-dependent unpredictable conditions which
>> will evalute to true for all 64bit OSes and most 32bit ones.
> I definitely don't buy the "all 64bit OSes" part here: Anyone doing
> full 80-bit FP operations will have to use the FPU, and hence may
> want to have some unmasked exceptions.

And all 0 people doing that is still 0.

Yes I'm being a little facetious, but there is exceedingly little
software which uses 80-bit FPU operations these days, as it has been
superseded by SSE.

>  I'm also not sure why you
> call them "unpredictable": If all (or most) cases match, the branch
> there could be pretty well predicted (subject of course to capacity).

Data-dependent branches which have no correlation to pattern history, of
which this is an example, are frequently mispredicted because they are
inherently unstable.

In this case, you're trading off the fact that an unmasked exception is
basically never pending, against the cost of mispredicts in the context
switch path.

> All in all I'd prefer if the conditions remained in place; my minimal
> request would be for there to be a comment why there's no evaluation
> of FSW/FCW.
>
>> --- a/xen/arch/x86/i387.c
>> +++ b/xen/arch/x86/i387.c
>> @@ -43,20 +43,17 @@ static inline void fpu_fxrstor(struct vcpu *v)
>>  const typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt;
>>  
>>  /*
>> - * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
>> + * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
> Are there any non-AMD CPUs known to have this issue? If not, is
> there a particular reason you don't say "Some AMD CPUs ..."?

I'm not aware of any, but leaving it as "Some AMD" might become stale if
others do surface.

Information about which CPUs are affected should exclusively be
determined by the logic which sets cpu_bug_fpu_ptr_leak, which won't be
stale.

>>   * is pending. Clear the x87 state here by setting it to fixed
>>   * values. The hypervisor data segment can be sometimes 0 and
>>   * sometimes new user value. Both should be ok. Use the FPU saved
>>   * data block as a safe address because it should be in L1.
>>   */
>> -if ( !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) &&
>> - boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
>> -{
>> +if ( cpu_bug_fpu_ptr_leak )
>>  asm volatile ( "fnclex\n\t"
>> "ffree %%st(7)\n\t" /* clear stack tag */
>> "fildl %0"  /* load to clear state */
>> : : "m" (*fpu_ctxt) );
> If here and in the respective xsave instance you'd use alternatives
> patching, I wouldn't mind the use of a X86_BUG_* for this (as made
> possible by patch 1).

a) this should probably be a static branch if/when we gain that
infrastructure, but ...

> But as said before, just like for synthetic
> features I strongly think we should use simple boolean variables
> when using them only in if()-s. Use of the feature(/bug) machinery
> is needed only to not further complicate alternatives patching.

... b)

I see I'm going to have to repeat myself, which is time I can't really
afford to waste.

x86_capabilities is not, and has never been, "just for alternatives". 
It is also not how it is currently used in Xen.

I also don't agree with the general suggestion because amongst other
things, there is a factor of 8 storage difference between one extra bit
in x86_capabilities[] and using scattered booleans.

This series, and a number of related series, have been overdue for more
than a year now, partly because of speculative preemption, but also
partly because of attempted scope creep such as this.  Scope creep is
having a catastrophic e

[Xen-devel] [PATCH 1/2] tools/shim: Fix race condition creating linkfarm.stamp

2019-09-02 Thread Andrew Cooper

In the case the while loop gets interrupted, the target musn't appear as
up-to-date.  The mov $X.tmp $X must be the last action of the rule.

Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 
CC: Sander Eikelenboom 
---
 tools/firmware/xen-dir/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/xen-dir/Makefile b/tools/firmware/xen-dir/Makefile
index 697bbbd57b..df3f5a7006 100644
--- a/tools/firmware/xen-dir/Makefile
+++ b/tools/firmware/xen-dir/Makefile
@@ -32,9 +32,9 @@ linkfarm.stamp: $(DEP_DIRS) $(DEP_FILES) FORCE
echo $(f) >> linkfarm.stamp.tmp ;)
cmp -s linkfarm.stamp.tmp linkfarm.stamp && \
rm linkfarm.stamp.tmp || { \
+   cat linkfarm.stamp.tmp | while read f; \
+ do rm -f "$(D)/$$f"; ln -s "$(XEN_ROOT)/$$f" "$(D)/$$f"; 
done; \
mv linkfarm.stamp.tmp linkfarm.stamp; \
-   cat linkfarm.stamp | while read f; \
- do rm -f "$(D)/$$f"; ln -s "$(XEN_ROOT)/$$f" "$(D)/$$f"; done 
\
}
 
 # Copy enough of the tree to build the shim hypervisor
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] tools/shim: Apply more duct tape to the linkfarm logic

2019-09-02 Thread Andrew Cooper

Sander reported a build failure which manifests as `make; make install`
failing with:

  /cross-install -m0644 -p xen-dir/xen-shim 
//usr/local/lib/xen/boot/xen-shim
  install: cannot stat 'xen-dir/xen-shim': No such file or directory
  make[4]: *** [Makefile:52: install] Error 1
  make[4]: Leaving directory '/usr/src/new/xen-unstable/tools/firmware'

It has subsequently been seen intermittently by OSSTest.  This was caused by
c/s 32b1d628 triggering a preexisting linkfarm bug for partial rebuilds.

Between the first `make` and the subsequent `make install`, the linkfarm logic
observes new final build products and regenerates the linkfarm.  This includes
a distclean, which throws away everything from the first `make`.

As the xen-shim rule use a symlink, the link itself remains still up-to-date
but is broken due to the distclean, which causes install to fail.

Update the linkfarm logic to not regenerate itself when build artefacts
appear.  This isn't a comprehensive fix but is the best which can be done
easily.  Any further effort would be better spent making out-of-tree builds
work for Xen.

Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 
CC: Sander Eikelenboom 
---
 tools/firmware/xen-dir/Makefile | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/xen-dir/Makefile b/tools/firmware/xen-dir/Makefile
index df3f5a7006..538931e9b4 100644
--- a/tools/firmware/xen-dir/Makefile
+++ b/tools/firmware/xen-dir/Makefile
@@ -14,6 +14,26 @@ LINK_FILES=Config.mk
 DEP_DIRS=$(foreach i, $(LINK_DIRS), $(XEN_ROOT)/$(i))
 DEP_FILES=$(foreach i, $(LINK_FILES), $(XEN_ROOT)/$(i))
 
+# Exclude some intermediate files and final build products
+LINK_EXCLUDES := '*.[isoa]' '.*.d' '.*.d2' '.config'
+LINK_EXCLUDES += '*.map' 'xen' 'xen.gz' 'xen.efi' 'xen-syms'
+
+# This is all a giant mess and doesn't really work.
+#
+# The correct solution is to fix Xen to be able to do out-of-tree builds.
+#
+# Until that happens, we set up a linkfarm by iterating over the xen/ tree,
+# linking source files.  This is repeated each time we enter this directory,
+# which poses a problem for a two-step "make; make install" build process.
+#
+# Any time the list of files to link changes, we relink all files, then
+# distclean to take out not-easy-to-classify intermediate files.  This is to
+# support easy development of the shim, but has a side effect of clobbering
+# the already-built shim.
+#
+# $(LINK_EXCLUDES) should be set such that a parallel build of shim and xen/
+# doesn't cause a subsequent `make install` to decide to regenerate the
+# linkfarm.  This means that all final build artefacts must be excluded.
 linkfarm.stamp: $(DEP_DIRS) $(DEP_FILES) FORCE
mkdir -p $(D)
rm -f linkfarm.stamp.tmp
@@ -25,8 +45,7 @@ linkfarm.stamp: $(DEP_DIRS) $(DEP_FILES) FORCE
sed 's,^$(XEN_ROOT)/$(d)/,,g' | xargs mkdir -p .);) \
$(foreach d, $(LINK_DIRS), \
(cd $(XEN_ROOT); \
-find $(d) ! -type l -type f \
-$(addprefix ! -name , '*.[isoa]' '.*.d' '.*.d2')) \
+find $(d) ! -type l -type f $(addprefix ! -name 
,$(LINK_EXCLUDES))) \
 >> linkfarm.stamp.tmp ; ) \
$(foreach f, $(LINK_FILES), \
echo $(f) >> linkfarm.stamp.tmp ;)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/2] tools/shim: Bodge things harder

2019-09-02 Thread Andrew Cooper

This logic is all terrible.  This series should resolve the reported build
failure, but definitely isn't a "proper" fix.

Andrew Cooper (2):
  tools/shim: Fix race condition creating linkfarm.stamp
  tools/shim: Apply more duct tape to the linkfarm logic

 tools/firmware/xen-dir/Makefile | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/8] x86emul: support WBNOINVD

2019-09-03 Thread Andrew Cooper

On 03/09/2019 10:37, Jan Beulich wrote:
> Rev 037 of Intel's ISA extensions document does not state intercept
> behavior for the insn (I've been unofficially told that the distinction
> is going to be by exit qualification, as I would have assumed
> considering that this way it's sufficiently transparent to unaware
> software, as using WBINVD in place of WBNOINVD is always correct, just
> less efficient). Similarly AMD's PM volume 2 version 3.31 only states
> that both use the same VMEXIT, but not how to distinugish them (other
> than by decoding the insn). Therefore in the HVM case for now it'll be
> backed by the same ->wbinvd_intercept() handlers.
>
> Use this occasion and also add the two missing table entries for
> CLDEMOTE, which doesn't require any further changes to make work.
>
> Signed-off-by: Jan Beulich 
> Reviewed-by: Paul Durrant 

Reviewed-by: Andrew Cooper 

I've put enabling WBNOINVD on the todo list, but there is still 0
feedback on the "docs/sphinx: todo/wishlist" thread towards the general
principle.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/8] x86/HVM: ignore guest INVD uses

2019-09-03 Thread Andrew Cooper

On 03/09/2019 10:37, Jan Beulich wrote:
> The only place we'd expect the insn to be sensibly used is in
> (virtualization unaware) firmware.
>
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 
> ---
> v3: New.
>
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2210,11 +2210,18 @@ static int hvmemul_cache_op(
>  
>  hvmemul_unmap_linear_addr(mapping, addr, 0, hvmemul_ctxt);
>  /* fall through */
> -case x86emul_invd:
>  case x86emul_wbinvd:
>  case x86emul_wbnoinvd:
>  alternative_vcall(hvm_funcs.wbinvd_intercept);
>  break;
> +
> +case x86emul_invd:
> +/*
> + * Deliberately ignored: We don't want to issue INVD, and issuing 
> WBINVD

I'd phrase this more strongly.  We absolutely must not issue INVD or we
break cache coherency.

Ideally with this adjusted, Reviewed-by: Andrew Cooper


> + * wouldn't match the request. And the only place we'd expect the 
> insn
> + * to be sensibly used is in (virtualization unaware) firmware.
> + */
> +break;
>  }
>  
>  return X86EMUL_OKAY;
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/8] x86emul: support MOVDIR{I, 64B} insns

2019-09-03 Thread Andrew Cooper

On 03/09/2019 10:39, Jan Beulich wrote:
> Note that SDM revision 070 doesn't specify exception behavior for
> ModRM.mod != 0b11; assuming #UD here.
>
> Signed-off-by: Jan Beulich 

What are we going to do about the ->write() hook atomicity?  I'm happy
to put it on the TODO list, but we can't simply ignore the problem.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 7/8] x86emul: support RDPRU

2019-09-03 Thread Andrew Cooper

On 03/09/2019 10:41, Jan Beulich wrote:
> While the PM doesn't say so, this assumes that the MPERF value read this
> way gets scaled similarly to its reading through RDMSR.
> 
> Signed-off-by: Jan Beulich 

This wants the following hunks merged, to at least keep the
intercept/exit codes up to date.  This is from my alternative series
which intercepted and terminated RDPRU with #UD.

diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h
b/xen/include/asm-x86/hvm/svm/vmcb.h
index 5c710286f7..2bf0d50f6d 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -76,7 +76,8 @@ enum GenericIntercept2bits
 GENERAL2_INTERCEPT_MONITOR = 1 << 10,
 GENERAL2_INTERCEPT_MWAIT   = 1 << 11,
 GENERAL2_INTERCEPT_MWAIT_CONDITIONAL = 1 << 12,
-GENERAL2_INTERCEPT_XSETBV  = 1 << 13
+GENERAL2_INTERCEPT_XSETBV  = 1 << 13,
+GENERAL2_INTERCEPT_RDPRU   = 1 << 14,
 };


@@ -300,6 +301,7 @@ enum VMEXIT_EXITCODE
 VMEXIT_MWAIT= 139, /* 0x8b */
 VMEXIT_MWAIT_CONDITIONAL= 140, /* 0x8c */
 VMEXIT_XSETBV   = 141, /* 0x8d */
+VMEXIT_RDPRU= 142, /* 0x8e */
 VMEXIT_NPF  = 1024, /* 0x400, nested paging fault */
 VMEXIT_INVALID  =  -1
 };


> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -545,6 +545,11 @@ void recalculate_cpuid_policy(struct dom
>  
>  p->extd.maxlinaddr = p->extd.lm ? 48 : 32;
>  
> +if ( p->extd.rdpru )
> +p->extd.rdpru_max = min(p->extd.rdpru_max, max->extd.rdpru_max);
> +else
> +p->extd.rdpru_max = 0;
> +
>  recalculate_xstate(p);
>  recalculate_misc(p);

The CPUID logic needs quite a bit more than this, and to be safe on
migrate.  For one, recalculate_xstate() unilaterally clobbers this to 0.

Shall I do a prep patch getting the CPUID side of things in order?

>  
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -5670,6 +5671,52 @@ x86_emulate(
>  limit -= sizeof(zero);
>  }
>  break;
> +
> +case 0xfd: /* rdpru */
> +vcpu_must_have(rdpru);
> +
> +if ( !mode_ring0() )
> +{
> +fail_if(!ops->read_cr);
> +if ( (rc = ops->read_cr(4, &cr4, ctxt)) != X86EMUL_OKAY )
> +goto done;
> +generate_exception_if(cr4 & X86_CR4_TSD, EXC_UD);
> +}
> +
> +switch ( _regs.ecx )
> +{
> +case 0:  n = MSR_IA32_MPERF; break;
> +case 1:  n = MSR_IA32_APERF; break;
> +default: n = 0; break;
> +}
> +if ( _regs.ecx > ctxt->cpuid->extd.rdpru_max )
> +n = 0;

This can be folded into the switch statement.  Something like (
_regs.ecx | -(_regs.ecx > ctxt->cpuid->extd.rdpru_max) )

Also, the sentinel might better be -1, which is not in a defined MSR
block.  MSR 0 is a P5-compat MCE MSR, even on AMD hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 02/11] ioreq: terminate cf8 handling at hypervisor level

2019-09-03 Thread Andrew Cooper

On 03/09/2019 17:14, Roger Pau Monne wrote:
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 692b710b02..69652e1080 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -1015,6 +1015,12 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, 
> ioservid_t id,
>  switch ( type )
>  {
>  case XEN_DMOP_IO_RANGE_PORT:
> +rc = -EINVAL;
> +/* PCI config space accesses are handled internally. */
> +if ( start <= 0xcf8 + 8 && 0xcf8 <= end )
> +goto out;
> +else
> +/* fallthrough. */

You need to drop the else, or it may still trigger warnings.

Furthermore, qemu registers cf8-cff so I think you need some fix-ups
there first before throwing errors back here.

Finally, this prohibits registering cf9 which may legitimately not be
terminated in Xen.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware

2019-09-03 Thread Andrew Cooper

On 02/09/2019 15:50, Jan Beulich wrote:
>>>  I'm also not sure why you
>>> call them "unpredictable": If all (or most) cases match, the branch
>>> there could be pretty well predicted (subject of course to capacity).
>> Data-dependent branches which have no correlation to pattern history, of
>> which this is an example, are frequently mispredicted because they are
>> inherently unstable.
>>
>> In this case, you're trading off the fact that an unmasked exception is
>> basically never pending, against the cost of mispredicts in the context
>> switch path.
> For
>
> if ( !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) &&
>
> you're claiming it to be true most of the time. How could the
> predictor be mislead if whenever this is encountered the result
> of the double & is zero?

Because whether it is 0 or not is unrelated to previous history.

As this argument isn't getting anywhere, I'll leave it in for now and do
the perf work to demonstrate the problem at some point when I don't have
15 other things needing doing yesterday.

>>> But as said before, just like for synthetic
>>> features I strongly think we should use simple boolean variables
>>> when using them only in if()-s. Use of the feature(/bug) machinery
>>> is needed only to not further complicate alternatives patching.
>> ... b)
>>
>> I see I'm going to have to repeat myself, which is time I can't really
>> afford to waste.
>>
>> x86_capabilities is not, and has never been, "just for alternatives". 
>> It is also not how it is currently used in Xen.
> And I've not been claiming this.

You literally have, and it is quoted above.

>  Nevertheless my opinion is that it
> shouldn't be needlessly abused beyond its main purpose.

The purpose is to be a collection bits, stored in reasonably efficient
manner.  Synthetic features, as well as bugs are related information,
and very definitely capabilities of the CPU.

Alternatives use the x86_capabilities[] bitmap, which existed for 2
decades previously, because it happens to be in a convenient form.  The
fact that alternatives do use x86_capabilities[] has no bearing on what
is reasonable or appropriate data to store in the bitmap, and it
certainly doesn't mean that data-not-used-for-patching should be purged.

> I thought I had successfully convinced you of not adding synthetic
> feature (non-bug) flags either anymore, unless needed for alternatives
> patching.

No.

I don't think you realise how quite how infuriating it was trying to
meet the embargos for speculative issues.  We had series which were 10's
of patches long, being invasively rewritten leading up to the embargo. 
Some requests where legitimate - I'm not going to pretend otherwise, but
some really were minutia like this which really didn't help the situation.

There are two big series outstanding, MSR_VIRT_SPEC_CTRL and CPUID
Policy, which is getting to be reprehensibly late, and both of which had
proper embargos I was trying to meet. 

There was no way VIRT_SPEC_CTRL was going to meet the SSBD embargo
because of the delay getting the spec together, but running Xen on AMD
hardware is currently embarrassing and slow due to guests falling back
to native means and hitting:

(XEN) emul-priv-op.c:1113:d0v2 Domain attempted WRMSR c0011020 from
0x00064040 to 0x000640400400

on their context switch path, and doing a good job of filling /var/log/
in minutes.

CPUID policy is even worse.  It's not currently safe to migrate VMs on
Intel hardware, because we can't level MSR_ARCH_CAPS.RSBA across the
migration pool, and this is something which really should have met the
L1TF embargo a year ago, but which was stopped dead in its tracks
because I couldn't even argue in public as to why it needed to be done
certain ways.  It also means that Xen is crippled on current-generation
Intel hardware.

The sad fact is that it is rather too easy to put off finishing that
work when there is other just-as-important work to do, and the thought
of arguing over further minutia on vN+1 is occasionally too exhausting
to contemplate.

> Anyway - in the interest of forward progress, yet without being
> convinced at all, I'll (as in so many earlier cases) give in here and
> see about acking patch 1 then.

Thankyou.

>
>> I also don't agree with the general suggestion because amongst other
>> things, there is a factor of 8 storage difference between one extra bit
>> in x86_capabilities[] and using scattered booleans.
> While a valid argument at the first glance, there's nothing keeping
> us from having a feature flag independent bitmap. Correct my if I'm
> wrong, but I've gained the impression that you want this mainly
> because Linux does it this way.

To a first approximation, yes - this is a construct we inherited from
Linux, and I'm continuing to use it in the way Linux uses it.

>
>>>  With this, keying the return to cpu_bug_* also doesn't
>>> look very nice, but I admit I can't suggest a better alternative
>>> (other than leaving the vendor check in place and check

Re: [Xen-devel] [xen-unstable test] 140960: regressions - FAIL

2019-09-04 Thread Andrew Cooper

On 04/09/2019 09:36, Jan Beulich wrote:
> On 03.09.2019 22:00, osstest service owner wrote:
>> flight 140960 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/140960/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-xl-pvshim   18 guest-localmigrate/x10   fail REGR. vs. 
>> 139876
> This looks to be recurring, so I've taken another look.

I had a suspicion as well, but fixing the intermittent build problems
was the first priority.

A major change in shim in the range under test is switching from Credit1
to NULL as a scheduler, following Dario's fixing of what we thought was
the final outstanding bug.  Perhaps it wasn't the final bug...

>  The three
> migrations leave this abbreviated pattern in the log:
>
> Sep  3 14:20:42.446667 (XEN) HVM d1v0 save: CPU_MSR
> ...
> Sep  3 14:20:57.850670 (XEN) HVM2 restore: CPU 0
> ...
> Sep  3 14:21:37.062840 (XEN) HVM d2v0 save: CPU_MSR
> Sep  3 14:21:37.062888 (XEN) HVM3 restore: CPU 0
> ...
> Sep  3 14:21:56.438552 (XEN) HVM d3v0 save: CPU_MSR
> ...
> Sep  3 14:22:11.506508 (XEN) HVM4 restore: CPU 0
>
> Therefore I wonder whether the first one got lucky and finished
> barely ahead of timing out, while the 2nd worked instantly and the
> 3rd then ended up exceeding the timeout. What is curious are the
> intermediate log entries (between the last "save" and the first
> corresponding "restore" log entries): Many ones of the form
>
> (XEN) emul-priv-op.c:1113:d0v2 Domain attempted WRMSR c0011020 from 
> 0x to 0x0040

This is due to a lack of MSR_VIRT_SPEC_CTRL.  It is sshd (or systemd on
its behalf - unclear which) using the SSBD prctl to protect itself, and
Xen, having no support, is causing Linux to fall back to native methods
and falling fowl of Xens write/discard policy on MSRs.

> with a 15s gap between the first and many subsequent ones) and
> finally one of the form
>
> [  451.267669] systemd-logind[2766]: New session 39 of user root.
>
> And finally, at around the time of the failed migration
>
> INIT: Id "T0" respawning too fast: disabled for 5 minutes

Googling around suggests it is an inittab misconfiguration.

>
> While it's not clear that this parallel activity is causing the
> migration to progress too slowly, it looks to be a possibility at
> least. Can anyone explain what these are?
>
>>  build-amd64-xsm   6 xen-buildfail REGR. vs. 
>> 139876
> I take it that this is supposed to be taken care of by a342900d48
> ("tools/shim: Apply more duct tape to the linkfarm logic").

Yes - it should do.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/5] xen: modify several static locks to unique names

2019-09-04 Thread Andrew Cooper

On 04/09/2019 09:40, Jan Beulich wrote:
> On 04.09.2019 10:25, Juergen Gross wrote:
>> On 03.09.19 17:09, Jan Beulich wrote:
>>> On 03.09.2019 17:03, Juergen Gross wrote:
 On 03.09.19 16:53, Jan Beulich wrote:
> On 29.08.2019 12:18, Juergen Gross wrote:
>> In order to have unique names when doing lock profiling several local
>> locks "lock" need to be renamed.
> But these are all named simply "lock" for a good reason, including
> because they're all function scope symbols (and typically the
> functions are all sufficiently short). The issue stems from the
> dual use of "name" in
>
> #define _LOCK_PROFILE(name) { 0, #name, &name, 0, 0, 0, 0, 0 }
>
> so I'd rather suggest making this a derivation of a new
>
> #define _LOCK_PROFILE_NAME(lock, name) { 0, #name, &lock, 0, 0, 0, 0, 0 }
>
> if there's no other (transparent) way of disambiguating the names.
 This will require to use a different DEFINE_SPINLOCK() variant, so e.g.
 DEFINE_SPINLOCK_LOCAL(), which will then include the needed "static" and
 add "@" to the lock profiling name. Is this okay?
>>> To be frank - not really. I dislike both, and would hence prefer to
>>> stick to what there is currently, until someone invents a transparent
>>> way to disambiguate these. I'm sorry for being unhelpful here.
>> I think I have found a way: I could add __FILE__ and __LINE__ data to
>> struct lock_profile. In lock_prof_init() I could look for non-unique
>> lock names and mark those to be printed with the __FILE__ and __LINE__
>> data added to the names.
>>
>> Would you be fine with this approach?
> I would be, but I'm afraid Andrew won't (as with any new uses of __LINE__).

The ok-ness of using __LINE__ is inversely proportional to the
likelihood of developing a livepatch for this particular build of Xen,
and what additional patching delta it would cause through unrelated changes.

We still have __LINE__ in a few cases in release builds because the
utility has been deemed to outweigh the additional livepatch overhead. 
A specific example is x86_emulate(), where the extra patching delta is 0
given that it is a single function.

For stuff which wouldn't be active in production builds, and unlikely to
be on the majority of debug builds, then it should be fine to use.

The one remaining problematic use of __LINE__ is in domain_crash() and
it needs to disappear because it tends to have a massive
unrelated-patch-delta for release builds.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/5] xen: modify several static locks to unique names

2019-09-04 Thread Andrew Cooper

On 04/09/2019 10:11, Juergen Gross wrote:
> On 04.09.19 10:58, Andrew Cooper wrote:
>> On 04/09/2019 09:40, Jan Beulich wrote:
>>> On 04.09.2019 10:25, Juergen Gross wrote:
>>>> On 03.09.19 17:09, Jan Beulich wrote:
>>>>> On 03.09.2019 17:03, Juergen Gross wrote:
>>>>>> On 03.09.19 16:53, Jan Beulich wrote:
>>>>>>> On 29.08.2019 12:18, Juergen Gross wrote:
>>>>>>>> In order to have unique names when doing lock profiling several
>>>>>>>> local
>>>>>>>> locks "lock" need to be renamed.
>>>>>>> But these are all named simply "lock" for a good reason, including
>>>>>>> because they're all function scope symbols (and typically the
>>>>>>> functions are all sufficiently short). The issue stems from the
>>>>>>> dual use of "name" in
>>>>>>>
>>>>>>> #define _LOCK_PROFILE(name) { 0, #name, &name, 0, 0, 0, 0, 0 }
>>>>>>>
>>>>>>> so I'd rather suggest making this a derivation of a new
>>>>>>>
>>>>>>> #define _LOCK_PROFILE_NAME(lock, name) { 0, #name, &lock, 0, 0,
>>>>>>> 0, 0, 0 }
>>>>>>>
>>>>>>> if there's no other (transparent) way of disambiguating the names.
>>>>>> This will require to use a different DEFINE_SPINLOCK() variant,
>>>>>> so e.g.
>>>>>> DEFINE_SPINLOCK_LOCAL(), which will then include the needed
>>>>>> "static" and
>>>>>> add "@" to the lock profiling name. Is this okay?
>>>>> To be frank - not really. I dislike both, and would hence prefer to
>>>>> stick to what there is currently, until someone invents a transparent
>>>>> way to disambiguate these. I'm sorry for being unhelpful here.
>>>> I think I have found a way: I could add __FILE__ and __LINE__ data to
>>>> struct lock_profile. In lock_prof_init() I could look for non-unique
>>>> lock names and mark those to be printed with the __FILE__ and __LINE__
>>>> data added to the names.
>>>>
>>>> Would you be fine with this approach?
>>> I would be, but I'm afraid Andrew won't (as with any new uses of
>>> __LINE__).
>>
>> The ok-ness of using __LINE__ is inversely proportional to the
>> likelihood of developing a livepatch for this particular build of Xen,
>> and what additional patching delta it would cause through unrelated
>> changes.
>
> Not related to this patch, but to __LINE__ and livepatching: have you
> considered to use the "#line" directive to avoid unrelated diffs?

There are ways to play with __LINE__, yes.  #line was brought up in the
original discussion.

As a thought experiment, how would you expect this to be used to
simplify a livepatch?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Remove stale crashkernel= example from documentation

2019-09-04 Thread Andrew Cooper

On 04/09/2019 10:14, Olaf Hering wrote:
> A plain crashkernel=size is apparently not supported by the code
> anymore. In case kdump ever worked like that, the code which removed
> support for this notation did not update the documentation.
>
> Signed-off-by: Olaf Hering 

That sounds like an accidental regression in parsing of crashkernel=,
rather than a deliberate action.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 7/8] x86emul: support RDPRU

2019-09-04 Thread Andrew Cooper

On 04/09/2019 10:26, Jan Beulich wrote:
> On 03.09.2019 14:34, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpuid.c
>>> +++ b/xen/arch/x86/cpuid.c
>>> @@ -545,6 +545,11 @@ void recalculate_cpuid_policy(struct dom
>>>  
>>>  p->extd.maxlinaddr = p->extd.lm ? 48 : 32;
>>>  
>>> +if ( p->extd.rdpru )
>>> +p->extd.rdpru_max = min(p->extd.rdpru_max, max->extd.rdpru_max);
>>> +else
>>> +p->extd.rdpru_max = 0;
>>> +
>>>  recalculate_xstate(p);
>>>  recalculate_misc(p);
>> The CPUID logic needs quite a bit more than this, and to be safe on
>> migrate.  For one, recalculate_xstate() unilaterally clobbers this to 0.
> I've looked again - recalculate_misc() clobbers .a, .b, and .c,
> but not .d afaics.

It is clobbered in the common section at the top.

...
    /* Most of Power/RAS hidden from guests. */
    p->extd.raw[0x7].a = p->extd.raw[0x7].b = p->extd.raw[0x7].c = 0;

    p->extd.raw[0x8].d = 0;

    switch ( p->x86_vendor )
    {
...

> Anyway, just as a note, as you've said you'd
> take care of this anyway, and I'll re-base afterwards.

I looked at this a bit yesterday, and it very ugly opencoding bits of
the policy work without that work.

I've got an idea for just enough skeleton policy work to avoid the
duplicated effort, which I think will be a more sensible way of making
progress.  It will certainly reduce the latency on being able to start
MSR_SPEC_CTRL work.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/8] x86emul: support MOVDIR{I, 64B} insns

2019-09-04 Thread Andrew Cooper

On 03/09/2019 13:25, Jan Beulich wrote:
> On 03.09.2019 12:28, Andrew Cooper wrote:
>> On 03/09/2019 10:39, Jan Beulich wrote:
>>> Note that SDM revision 070 doesn't specify exception behavior for
>>> ModRM.mod != 0b11; assuming #UD here.
>>>
>>> Signed-off-by: Jan Beulich 
>> What are we going to do about the ->write() hook atomicity?  I'm happy
>> to put it on the TODO list, but we can't simply ignore the problem.
> So do you not agree with my assessment that our memcpy()
> implementation satisfies the need, and it's just not very
> nice that the ->write() hook is dependent upon this?

We use __builtin_memcpy(), not necessarily our own local implementation.

Our own copy uses rep movsq followed by rep movsb of up to 7 bytes,
which doesn't handle 2 and 4 byte copies atomically.  More generally,
rep movs doesn't provide guarantees about the external visibility of
component writes, owing to the many different ways that such writes may
be implemented, and optimised.

Even if the above were fixed (and I'm not sure could make it correct
even for misaligned writes), we cannot depend on our own memcpy never
changing in the future.  For one, it should be a straight rep movsb on
most Intel hardware these days, for performance.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)

2019-09-04 Thread Andrew Cooper

On 04/09/2019 08:55, Jan Beulich wrote:
> Commit 2634b997af ("x86/shadow: don't enable shadow mode with too small
> a shadow allocation") was incomplete: The adjustment done there to
> shadow_enable() is also needed in shadow_one_bit_enable(). The (new)
> problem report was (apparently) a failed PV guest migration followed by
> another migration attempt for that same guest. Disabling log-dirty mode
> after the first one had left a couple of shadow pages allocated (perhaps
> something that also wants fixing), and hence the second enabling of
> log-dirty mode wouldn't have allocated anything further.
>
> Reported-by: James Wang 
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -2864,7 +2864,8 @@ static int shadow_one_bit_enable(struct
>  
>  mode |= PG_SH_enable;
>  
> -if ( d->arch.paging.shadow.total_pages == 0 )
> +if ( d->arch.paging.shadow.total_pages <
> + sh_min_allocation(d) + d->arch.paging.shadow.p2m_pages )

This logic looks suspect.  The change from == 0 to < min looks fine, and
feels like the right thing to do.

However,  sh_min_allocation() should by definition be the minimum
quantity of pages necessary for things to function, which makes the + on
the end look wrong.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH -tip 0/2] x86: Prohibit kprobes on XEN_EMULATE_PREFIX

2019-09-04 Thread Andrew Cooper

On 04/09/2019 12:45, Masami Hiramatsu wrote:
> Hi,
>
> These patches allow x86 instruction decoder to decode
> xen-cpuid which has XEN_EMULATE_PREFIX, and prohibit
> kprobes to probe on it.
>
> Josh reported that the objtool can not decode such special
> prefixed instructions, and I found that we also have to
> prohibit kprobes to probe on such instruction.
>
> This series can be applied on -tip master branch which
> has merged Josh's objtool/perf sharing common x86 insn
> decoder series.

The paravirtualised xen-cpuid is were you'll see it most in a regular
kernel, but be aware that it is also used for testing purposes in other
circumstances, and there is an equivalent KVM prefix which is used for
KVM testing.

It might be better to generalise the decode support to "virtualisation
escape prefix" or something slightly more generic.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] VT-d: avoid PCI device lookup

2019-09-04 Thread Andrew Cooper

On 04/09/2019 14:27, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -530,6 +530,7 @@ struct intel_iommu {
>  struct ir_ctrl ir_ctrl;
>  struct iommu_flush flush;
>  struct acpi_drhd_unit *drhd;
> +nodeid_t node;
>  };

Can I talk you into putting this into "struct iommu" immediately?

It will collide far less with  the series I've got that folds all of
these redundant structs together.

~Andrew

https://lore.kernel.org/xen-devel/1550862806-30236-1-git-send-email-andrew.coop...@citrix.com/

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 2/2] x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware

2019-09-04 Thread Andrew Cooper

AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
FOP/FIP/FDP if an x87 exception isn't pending.  This causes an information
leak, CVE-2006-1056, and worked around by several OSes, including Xen.  AMD
Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.

Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to all
guests by default.  While adjusting libxl's cpuid table, add CLZERO which
looks to have been omitted previously.

Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
on AMD hardware where RSTR_FP_ERR_PTRS is not advertised.  Optimise the
conditions for the workaround paths.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

v3:
 * Rename to X86_BUG_FPU_PTRS
 * Reinstate, contrary to personal opinion, the fsw/fcw checks.

v2:
 * Use the AMD naming, not that I am convinced this is a sensible name to use.
 * Adjust the i387 codepaths as well as the xstate ones.
 * Add xen-cpuid/libxl data for the CPUID bit.
---
 tools/libxl/libxl_cpuid.c   |  3 +++
 tools/misc/xen-cpuid.c  |  1 +
 xen/arch/x86/cpu/amd.c  |  7 +++
 xen/arch/x86/i387.c | 16 +++-
 xen/arch/x86/xstate.c   |  7 +++
 xen/include/asm-x86/cpufeature.h|  3 +++
 xen/include/asm-x86/cpufeatures.h   |  2 ++
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 8 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index f1c6ce2076..953a3bbd8c 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -257,8 +257,11 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*cpuid, const char* str)
 
 {"invtsc",   0x8007, NA, CPUID_REG_EDX,  8,  1},
 
+{"clzero",   0x8008, NA, CPUID_REG_EBX,  0,  1},
+{"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1},
 {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX,  9,  1},
 {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12,  1},
+
 {"nc",   0x8008, NA, CPUID_REG_ECX,  0,  8},
 {"apicidsize",   0x8008, NA, CPUID_REG_ECX, 12,  4},
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index be6a8d27a5..f51facffb6 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -145,6 +145,7 @@ static const char *const str_e7d[32] =
 static const char *const str_e8b[32] =
 {
 [ 0] = "clzero",
+[ 2] = "rstr-fp-err-ptrs",
 
 /* [ 8] */[ 9] = "wbnoinvd",
 
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index a2f83c79a5..dc9ed55ba6 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -580,6 +580,13 @@ static void init_amd(struct cpuinfo_x86 *c)
}
 
/*
+* Older AMD CPUs don't save/load FOP/FIP/FDP unless an FPU exception
+* is pending.  Xen works around this at (F)XRSTOR time.
+*/
+   if ( !cpu_has(c, X86_FEATURE_RSTR_FP_ERR_PTRS) )
+   setup_force_cpu_cap(X86_BUG_FPU_PTRS);
+
+   /*
 * Attempt to set lfence to be Dispatch Serialising.  This MSR almost
 * certainly isn't virtualised (and Xen at least will leak the real
 * value in but silently discard writes), as well as being per-core
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 88178485cb..e4f0965eed 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -43,20 +43,18 @@ static inline void fpu_fxrstor(struct vcpu *v)
 const typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt;
 
 /*
- * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
+ * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
  * is pending. Clear the x87 state here by setting it to fixed
  * values. The hypervisor data segment can be sometimes 0 and
  * sometimes new user value. Both should be ok. Use the FPU saved
  * data block as a safe address because it should be in L1.
  */
-if ( !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) &&
- boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-{
+if ( cpu_bug_fpu_ptrs &&
+ !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) )
 asm volatile ( "fnclex\n\t"
"ffree %%st(7)\n\t" /* clear stack tag */
"fildl %0"  /* load to clear state */
: : "m" (*fpu_ctxt) );
-}
 
 /*
  * FXRSTOR can fault if passed a corrupted data block. We handle this
@@ -169,11 +167,11 @@ static inline void fpu_fxsave(struct vcpu *v)
: "=m" (*fpu_ctxt) :

Re: [Xen-devel] [PATCH] x86/cpu-policy: work around bogus warning in test harness

2019-09-05 Thread Andrew Cooper

On 05/09/2019 07:14, Jan Beulich wrote:
> Despite %.12s properly limiting the number of characters read from
> ident[], gcc 9 (at least up to 9.2.0) warns about the strings not
> being nul-terminated:
>
> test-cpu-policy.c:64:18: error: '%.12s' directive argument is not a 
> nul-terminated string [-Werror=format-overflow=]
>64 | fail("  Test '%.12s', expected vendor %u, got %u\n",
>   |  ^~
> test-cpu-policy.c:20:12: note: in definition of macro 'fail'
>20 | printf(fmt, ##__VA_ARGS__); \
>   |^~~
> test-cpu-policy.c:64:27: note: format string is defined here
>64 | fail("  Test '%.12s', expected vendor %u, got %u\n",
>   |   ^
> test-cpu-policy.c:44:7: note: referenced argument declared here
>44 | } tests[] = {
>   |   ^
>
> The issue was reported against gcc in their bugzilla (bug 91667).
>
> Re-order array entries, oddly enough suppressing the warning.
>
> Reported-by: Christopher Clark 

Reported-by: Dario Faggioli  as well.

> Signed-off-by: Jan Beulich 

I see you managed to create a smaller synthetic example.  That is just
bizzare, but oh well.

Acked-by: Andrew Cooper 

>
> --- a/tools/tests/cpu-policy/test-cpu-policy.c
> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
> @@ -42,15 +42,16 @@ static void test_vendor_identification(v
>  };
>  unsigned int vendor;
>  } tests[] = {
> +/* The 1st entry should remain here to work around gcc bug 91667. */
> +{ { "" }, X86_VENDOR_UNKNOWN },
> +{ { "" }, X86_VENDOR_UNKNOWN },
> +{ { "" }, X86_VENDOR_UNKNOWN },
> +
>  { { "GenuineIntel" }, X86_VENDOR_INTEL },
>  { { "AuthenticAMD" }, X86_VENDOR_AMD },
>  { { "CentaurHauls" }, X86_VENDOR_CENTAUR },
>  { { "  Shanghai  " }, X86_VENDOR_SHANGHAI },
>  { { "HygonGenuine" }, X86_VENDOR_HYGON },
> -
> -{ { "" }, X86_VENDOR_UNKNOWN },
> -{ { "" }, X86_VENDOR_UNKNOWN },
> -{ { "" }, X86_VENDOR_UNKNOWN },
>  };
>  
>  printf("Testing CPU vendor identification:\n");


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 11868 matches

Mail list logo