from:"Chao Gao"

Re: [Xen-devel] [RFC PATCH V3 1/3] Xen: Increase hap/shadow page pool size to support more vcpus support

2017-11-28 Thread Chao Gao

On Wed, Sep 20, 2017 at 04:13:43PM +0100, Wei Liu wrote:
>On Tue, Sep 19, 2017 at 11:06:26AM +0800, Lan Tianyu wrote:
>> Hi Wei:
>> 
>> On 2017年09月18日 21:06, Wei Liu wrote:
>> > On Wed, Sep 13, 2017 at 12:52:47AM -0400, Lan Tianyu wrote:
>> >> This patch is to increase page pool size when max vcpu number is larger
>> >> than 128.
>> >>
>> >> Signed-off-by: Lan Tianyu 
>> >> ---
>> >>  xen/arch/arm/domain.c|  5 +
>> >>  xen/arch/x86/domain.c| 25 +
>> >>  xen/common/domctl.c  |  3 +++
>> >>  xen/include/xen/domain.h |  2 ++
>> >>  4 files changed, 35 insertions(+)
>> >>
>> >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> >> index 6512f01..94cf70b 100644
>> >> --- a/xen/arch/arm/domain.c
>> >> +++ b/xen/arch/arm/domain.c
>> >> @@ -824,6 +824,11 @@ int arch_vcpu_reset(struct vcpu *v)
>> >>  return 0;
>> >>  }
>> >>  
>> >> +int arch_domain_set_max_vcpus(struct domain *d)
>> >> +{
>> >> +return 0;
>> >> +}
>> >> +
>> >>  static int relinquish_memory(struct domain *d, struct page_list_head 
>> >> *list)
>> >>  {
>> >>  struct page_info *page, *tmp;
>> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> >> index dbddc53..0e230f9 100644
>> >> --- a/xen/arch/x86/domain.c
>> >> +++ b/xen/arch/x86/domain.c
>> >> @@ -1161,6 +1161,31 @@ int arch_vcpu_reset(struct vcpu *v)
>> >>  return 0;
>> >>  }
>> >>  
>> >> +int arch_domain_set_max_vcpus(struct domain *d)
>> > 
>> > The name doesn't match what the function does.
>> > 
>> 
>> I originally hoped to introduce a hook for each arch when set max vcpus.
>> Each arch function can do customized thing and so named
>> "arch_domain_set_max_vcpus".
>> 
>> How about "arch_domain_setup_vcpus_resource"?
>
>Before you go away and do a lot of work, please let us think about if
>this is the right approach first.
>
>We are close to freeze, with the amount of patches we receive everyday
>RFC patch like this one is low on my (can't speak for others) priority
>list. I am not sure when I will be able to get back to this, but do ping
>us if you want to know where things stand.

Hi, Wei.

The goal of this patch is to avoid running out of shadow pages. The
number of shadow pages is initialized (to 256 for hap, and 1024 for
shadow) when creating domain. Then the max vcpus is set. In this
process, for each vcpu, construct_vmcs()->paging_update_paging_modes()
->hap_make_monitor_table() always consume a shadow page. If there are
too many vcpus (i.e. more than 256), we would run out of shadow pages.

To address this, there are three solutions:
1) bump up the number of shadow pages to a proper value when setting
max vcpus like what this patch does. Actually, it can be done in
toolstack via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION.
2) toolstack (seeing libxl__arch_domain_create->xc_shadow_control())
enlarges or shrinks the shadow memory to another size (according xl.cfg,
the size is 1MB per guest vCPU plus 8KB per MB of guest RAM) after
setting max vcpus. If the sequence of the two operations can be exchanged,
this issue also disappears.
3) Considering that toolstack finally adjusts the shadow memory to a
proper size, I think enlarging the shadow pages from 256 to 512 just
like what the v1 patch
(https://lists.xenproject.org/archives/html/xen-devel/2017-08/msg03048.html)
does doesn't lead to more memory consumption. Since it introduces
minimal change, I prefer to this one.

Which one do you think is better?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM

2017-12-05 Thread Chao Gao

This series is based on Paul Durrant's "x86: guest resource mapping"
(https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg01735.html)
and "add vIOMMU support with irq remapping  function of virtual VT-d"
(https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg01063.html).

In order to support more vcpus in hvm, this series is to remove VCPU number
constraint imposed by several components:
1. IOREQ server: current only one IOREQ page is used, which limits
   the maximum number of vcpus to 128.
2. libacpi: no x2apic entry is built in MADT and SRAT
3. Size of pre-allocated shadow memory
4. The way how we boot up APs.

This series is RFC for
1. I am not sure whether changes in patch 2 are acceptable. 
2. It depends on our VIOMMU patches which are still under review.

Change since v3:
- Respond Wei and Roger's comments.
- Support multiple IOREQ pages. Seeing patch 1 and 2.
- boot APs through broadcast. Seeing patch 4.
- unify the computation of lapic_id.
- Add x2apic entry in SRAT.
- Increase shadow memory according to the maximum vcpus of HVM.

Change since v2:
1) Increase page pool size during setting max vcpu
2) Allocate madt table size according APIC id of each vcpus
3) Fix some code style issues.

Change since v1:
1) Increase hap page pool according vcpu number
2) Use "Processor" syntax to define vcpus with APIC id < 255
in dsdt and use "Device" syntax for other vcpus in ACPI DSDT table.
3) Use XAPIC structure for vcpus with APIC id < 255
in dsdt and use x2APIC structure for other vcpus in the ACPI MADT table.

This patchset is to extend some resources(i.e, event channel,
hap and so) to support more vcpus for single VM.

Chao Gao (6):
  ioreq: remove most 'buf' parameter from static functions
  ioreq: bump the number of IOREQ page to 4 pages
  xl/acpi: unify the computation of lapic_id
  hvmloader: boot cpu through broadcast
  x86/hvm: bump the number of pages of shadow memory
  x86/hvm: bump the maximum number of vcpus to 512

Lan Tianyu (2):
  Tool/ACPI: DSDT extension to support more vcpus
  hvmload: Add x2apic entry support in the MADT and SRAT build

 tools/firmware/hvmloader/apic_regs.h|   4 +
 tools/firmware/hvmloader/config.h   |   3 +-
 tools/firmware/hvmloader/smp.c  |  64 --
 tools/libacpi/acpi2_0.h |  25 +-
 tools/libacpi/build.c   |  57 +---
 tools/libacpi/libacpi.h |   9 ++
 tools/libacpi/mk_dsdt.c |  40 +++--
 tools/libxc/include/xc_dom.h|   2 +-
 tools/libxc/xc_dom_x86.c|   6 +-
 tools/libxl/libxl_x86_acpi.c|   2 +-
 xen/arch/x86/hvm/hvm.c  |   1 +
 xen/arch/x86/hvm/ioreq.c| 150 ++--
 xen/arch/x86/mm/hap/hap.c   |   2 +-
 xen/arch/x86/mm/shadow/common.c |   2 +-
 xen/include/asm-x86/hvm/domain.h|   6 +-
 xen/include/public/hvm/hvm_info_table.h |   2 +-
 xen/include/public/hvm/ioreq.h  |   2 +
 xen/include/public/hvm/params.h |   8 +-
 18 files changed, 303 insertions(+), 82 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory

2017-12-05 Thread Chao Gao

Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
(for hap case) pages are pre-allocated as shadow memory at beginning. It would
run out if guest has more than 256 vcpus and guest creation fails. Bump the
number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
for shadow case.

This patch won't lead to more memory consumption for the size of shadow memory
will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
of guest memory and the number of vcpus.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/mm/hap/hap.c   | 2 +-
 xen/arch/x86/mm/shadow/common.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 41deb90..f4cf578 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -455,7 +455,7 @@ int hap_enable(struct domain *d, u32 mode)
 if ( old_pages == 0 )
 {
 paging_lock(d);
-rv = hap_set_allocation(d, 256, NULL);
+rv = hap_set_allocation(d, 2 * HVM_MAX_VCPUS, NULL);
 if ( rv != 0 )
 {
 hap_set_allocation(d, 0, NULL);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 72c674e..5e66603 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3093,7 +3093,7 @@ int shadow_enable(struct domain *d, u32 mode)
 if ( old_pages == 0 )
 {
 paging_lock(d);
-rv = shadow_set_allocation(d, 1024, NULL); /* Use at least 4MB */
+rv = shadow_set_allocation(d, 8 * HVM_MAX_VCPUS, NULL);
 if ( rv != 0 )
 {
 shadow_set_allocation(d, 0, NULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id

2017-12-05 Thread Chao Gao

There were two places where the lapic_id is computed, one in hvmloader and one
in libacpi. Unify them by defining LAPIC_ID in a header file and incluing it
in both places.

To address compilation issue and make libacpi.h self-contained, include
stdint.h in libacpi.h.

Signed-off-by: Chao Gao 
---
v4:
 - new
---
 tools/firmware/hvmloader/config.h | 3 +--
 tools/libacpi/libacpi.h   | 3 +++
 tools/libxl/libxl_x86_acpi.c  | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h 
b/tools/firmware/hvmloader/config.h
index 6e00413..55e3a27 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -1,7 +1,7 @@
 #ifndef __HVMLOADER_CONFIG_H__
 #define __HVMLOADER_CONFIG_H__
 
-#include 
+#include 
 
 enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
 extern enum virtual_vga virtual_vga;
@@ -48,7 +48,6 @@ extern uint8_t ioapic_version;
 #define IOAPIC_ID   0x01
 
 #define LAPIC_BASE_ADDRESS  0xfee0
-#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
 
 #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 46a819d..b89fdb5 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -21,6 +21,9 @@
 #define __LIBACPI_H__
 
 #include 
+#include   /* uintXX_t */
+
+#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
 
 #define ACPI_HAS_COM1  (1<<0)
 #define ACPI_HAS_COM2  (1<<1)
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index bbe9219..0b7507d 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -87,7 +87,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 
 static uint32_t acpi_lapic_id(unsigned cpu)
 {
-return cpu * 2;
+return LAPIC_ID(cpu);
 }
 
 static int init_acpi_config(libxl__gc *gc, 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions

2017-12-05 Thread Chao Gao

It is a preparation to support multiple IOREQ pages.
No functional change.

Signed-off-by: Chao Gao 
---
v4:
 -new
---
 xen/arch/x86/hvm/ioreq.c | 48 +++-
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d991ac9..a879f20 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -237,10 +237,9 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, 
gfn_t gfn)
 set_bit(i, &d->arch.hvm_domain.ioreq_gfn.mask);
 }
 
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
+struct hvm_ioreq_page *iorp)
 {
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
 if ( gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
 
@@ -289,15 +288,15 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
  &iorp->va);
 
 if ( rc )
-hvm_unmap_ioreq_gfn(s, buf);
+hvm_unmap_ioreq_gfn(s, iorp);
 
 return rc;
 }
 
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s,
+   struct hvm_ioreq_page *iorp)
 {
 struct domain *currd = current->domain;
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
 if ( iorp->page )
 {
@@ -344,10 +343,9 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, 
bool buf)
 return 0;
 }
 
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s,
+   struct hvm_ioreq_page *iorp)
 {
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
 if ( !iorp->page )
 return;
 
@@ -380,11 +378,11 @@ bool is_ioreq_server_page(struct domain *d, const struct 
page_info *page)
 return found;
 }
 
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s,
+ struct hvm_ioreq_page *iorp)
 
 {
 struct domain *d = s->domain;
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
 if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
@@ -395,10 +393,10 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server 
*s, bool buf)
 clear_page(iorp->va);
 }
 
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s,
+ struct hvm_ioreq_page *iorp)
 {
 struct domain *d = s->domain;
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 int rc;
 
 if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -550,36 +548,36 @@ static int hvm_ioreq_server_map_pages(struct 
hvm_ioreq_server *s)
 rc = hvm_map_ioreq_gfn(s, true);
 
 if ( rc )
-hvm_unmap_ioreq_gfn(s, false);
+hvm_unmap_ioreq_gfn(s, &s->ioreq);
 
 return rc;
 }
 
 static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
 {
-hvm_unmap_ioreq_gfn(s, true);
-hvm_unmap_ioreq_gfn(s, false);
+hvm_unmap_ioreq_gfn(s, &s->ioreq);
+hvm_unmap_ioreq_gfn(s, &s->bufioreq);
 }
 
 static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
 {
 int rc;
 
-rc = hvm_alloc_ioreq_mfn(s, false);
+rc = hvm_alloc_ioreq_mfn(s, &s->ioreq);
 
 if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-rc = hvm_alloc_ioreq_mfn(s, true);
+rc = hvm_alloc_ioreq_mfn(s, &s->bufioreq);
 
 if ( rc )
-hvm_free_ioreq_mfn(s, false);
+hvm_free_ioreq_mfn(s, &s->ioreq);
 
 return rc;
 }
 
 static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
 {
-hvm_free_ioreq_mfn(s, true);
-hvm_free_ioreq_mfn(s, false);
+hvm_free_ioreq_mfn(s, &s->bufioreq);
+hvm_free_ioreq_mfn(s, &s->ioreq);
 }
 
 static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
@@ -646,8 +644,8 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server 
*s)
 if ( s->enabled )
 goto done;
 
-hvm_remove_ioreq_gfn(s, false);
-hvm_remove_ioreq_gfn(s, true);
+hvm_remove_ioreq_gfn(s, &s->ioreq);
+hvm_remove_ioreq_gfn(s, &s->bufioreq);
 
 s->enabled = true;
 
@@ -667,8 +665,8 @@ static void hvm_ioreq_server_disable(struct 
hvm_ioreq_server *s)
 if ( !s->enabled )
 goto done;
 
-hvm_add_ioreq_gfn(s, true);
-hvm_add_ioreq_gfn(s, false);
+hvm_add_ioreq_gfn(s, &s->bufioreq);
+hvm_add_ioreq_gfn(s, &s->ioreq);
 
 s->enabled = false;
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-05 Thread Chao Gao

One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
number constraint imposed by one IOREQ page, bump the number of IOREQ page to
4 pages. With this patch, multiple pages can be used as IOREQ page.

Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
FOR_EACH_IOREQ_PAGE macro.

In order to access an IOREQ page, QEMU should get the gmfn and map this gmfn
to its virtual address space. Now there are several pages, to be compatible
with previous QEMU, the interface to get the gmfn doesn't change. But newer
QEMU needs to get the gmfn repeatly until a same gmfn is found. To implement
this, an internal index is introduced: when QEMU queries the gmfn, the gmfn of
IOREQ page referenced by the index is returned.  After each operation, the
index increases by 1 and rewinds when it overflows.

Signed-off-by: Chao Gao 
---
v4:
 - new
---
 tools/libxc/include/xc_dom.h |   2 +-
 tools/libxc/xc_dom_x86.c |   6 +-
 xen/arch/x86/hvm/hvm.c   |   1 +
 xen/arch/x86/hvm/ioreq.c | 116 ++-
 xen/include/asm-x86/hvm/domain.h |   6 +-
 xen/include/public/hvm/ioreq.h   |   2 +
 xen/include/public/hvm/params.h  |   8 ++-
 7 files changed, 110 insertions(+), 31 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 45c9d67..2f8b412 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -20,7 +20,7 @@
 #include 
 
 #define INVALID_PFN ((xen_pfn_t)-1)
-#define X86_HVM_NR_SPECIAL_PAGES8
+#define X86_HVM_NR_SPECIAL_PAGES11
 #define X86_HVM_END_SPECIAL_REGION  0xff000u
 
 /* --- typedefs and structs  */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bff68a0..b316ebc 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -57,8 +58,8 @@
 #define SPECIALPAGE_BUFIOREQ 3
 #define SPECIALPAGE_XENSTORE 4
 #define SPECIALPAGE_IOREQ5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
+#define SPECIALPAGE_IDENT_PT (5 + MAX_IOREQ_PAGE)
+#define SPECIALPAGE_CONSOLE  (SPECIALPAGE_IDENT_PT + 1)
 #define special_pfn(x) \
 (X86_HVM_END_SPECIAL_REGION - X86_HVM_NR_SPECIAL_PAGES + (x))
 
@@ -612,6 +613,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
X86_HVM_NR_SPECIAL_PAGES) )
 goto error_out;
 
+xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_PAGES, MAX_IOREQ_PAGE);
 xc_hvm_param_set(xch, domid, HVM_PARAM_STORE_PFN,
  special_pfn(SPECIALPAGE_XENSTORE));
 xc_hvm_param_set(xch, domid, HVM_PARAM_BUFIOREQ_PFN,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5d06767..0b3bd04 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4077,6 +4077,7 @@ static int hvm_allow_set_param(struct domain *d,
 case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
 case HVM_PARAM_ALTP2M:
 case HVM_PARAM_MCA_CAP:
+case HVM_PARAM_IOREQ_PAGES:
 if ( value != 0 && a->value != value )
 rc = -EEXIST;
 break;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index a879f20..0a36001 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -64,14 +64,24 @@ static struct hvm_ioreq_server *get_ioreq_server(const 
struct domain *d,
 continue; \
 else
 
+/* Iterate over all ioreq pages */
+#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \
+for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ )
+
 static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-shared_iopage_t *p = s->ioreq.va;
+shared_iopage_t *p = s->ioreq[v->vcpu_id / IOREQ_NUM_PER_PAGE].va;
 
 ASSERT((v == current) || !vcpu_runnable(v));
 ASSERT(p != NULL);
 
-return &p->vcpu_ioreq[v->vcpu_id];
+return &p->vcpu_ioreq[v->vcpu_id % IOREQ_NUM_PER_PAGE];
+}
+
+static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+return s->ioreq[v->vcpu_id / IOREQ_NUM_PER_PAGE].va ?
+   get_ioreq(s, v) : NULL;
 }
 
 bool hvm_io_pending(struct vcpu *v)
@@ -252,10 +262,10 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server 
*s,
 iorp->gfn = INVALID_GFN;
 }
 
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf, uint8_t i)
 {
 struct domain *d = s->domain;
-struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[i];
 int rc;
 
 if ( iorp->page )
@@ -277,7 +287,7 @@ static int hvm_map_ioreq_gfn(struct h

[Xen-devel] [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build

2017-12-05 Thread Chao Gao

From: Lan Tianyu 

This patch contains the following changes:
1. add x2apic entry support for ACPI MADT table according to
 ACPI spec 5.2.12.12 Processor Local x2APIC Structure.
2. add x2apic entry support for ACPI SRAT table according to
 ACPI spec 5.2.16.3 Processor Local x2APIC Affinity Structure.

Signed-off-by: Lan Tianyu 
Signed-off-by: Chao Gao 
---
v4:
 - also add x2apic entry in SRAT
---
 tools/libacpi/acpi2_0.h | 25 --
 tools/libacpi/build.c   | 57 +++--
 2 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 6081417..7eb983d 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -322,6 +322,7 @@ struct acpi_20_waet {
 #define ACPI_IO_SAPIC   0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
 
 /*
  * APIC Structure Definitions.
@@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
 uint32_t flags;
 };
 
+struct acpi_20_madt_x2apic {
+uint8_t  type;
+uint8_t  length;
+uint16_t reserved;
+uint32_t x2apic_id;
+uint32_t flags;
+uint32_t acpi_processor_uid;
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
@@ -378,8 +388,9 @@ struct acpi_20_srat {
 /*
  * System Resource Affinity Table structure types.
  */
-#define ACPI_PROCESSOR_AFFINITY 0x0
-#define ACPI_MEMORY_AFFINITY0x1
+#define ACPI_PROCESSOR_AFFINITY 0x0
+#define ACPI_MEMORY_AFFINITY0x1
+#define ACPI_PROCESSOR_X2APIC_AFFINITY  0x2
 struct acpi_20_srat_processor {
 uint8_t type;
 uint8_t length;
@@ -391,6 +402,16 @@ struct acpi_20_srat_processor {
 uint32_t reserved;
 };
 
+struct acpi_20_srat_processor_x2apic {
+uint8_t type;
+uint8_t length;
+uint16_t reserved;
+uint32_t domain;
+uint32_t x2apic_id;
+uint32_t flags;
+uint32_t reserved2[2];
+};
+
 /*
  * Local APIC Affinity Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index df0a67c..5cbf6a9 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -30,6 +30,11 @@
 
 #define align16(sz)(((sz) + 15) & ~15)
 #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d))
+#define min(X, Y) ({ \
+const typeof (X) _x = (X);   \
+const typeof (Y) _y = (Y);   \
+(void) (&_x == &_y); \
+(_x < _y) ? _x : _y; })
 
 extern struct acpi_20_rsdp Rsdp;
 extern struct acpi_20_rsdt Rsdt;
@@ -79,16 +84,19 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 struct acpi_20_madt_intsrcovr *intsrcovr;
 struct acpi_20_madt_ioapic*io_apic;
 struct acpi_20_madt_lapic *lapic;
+struct acpi_20_madt_x2apic*x2apic;
 const struct hvm_info_table   *hvminfo = config->hvminfo;
-int i, sz;
+int i, sz, nr_apic;
 
 if ( config->lapic_id == NULL )
 return NULL;
 
+nr_apic = min(hvminfo->nr_vcpus, MADT_MAX_LOCAL_APIC);
 sz  = sizeof(struct acpi_20_madt);
 sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
 sz += sizeof(struct acpi_20_madt_ioapic);
-sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+sz += sizeof(struct acpi_20_madt_lapic) * nr_apic;
+sz += sizeof(struct acpi_20_madt_x2apic) * (hvminfo->nr_vcpus - nr_apic);
 
 madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
 if (!madt) return NULL;
@@ -149,7 +157,7 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 
 info->nr_cpus = hvminfo->nr_vcpus;
 info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
-for ( i = 0; i < hvminfo->nr_vcpus; i++ )
+for ( i = 0; i < nr_apic; i++ )
 {
 memset(lapic, 0, sizeof(*lapic));
 lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
@@ -157,12 +165,26 @@ static struct acpi_20_madt *construct_madt(struct 
acpi_ctxt *ctxt,
 /* Processor ID must match processor-object IDs in the DSDT. */
 lapic->acpi_processor_id = i;
 lapic->apic_id = config->lapic_id(i);
-lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-? ACPI_LOCAL_APIC_ENABLED : 0);
+lapic->flags = test_bit(i, hvminfo->vcpu_online)
+   ? ACPI_LOCAL_APIC_ENABLED : 0;
 lapic++;
 }
 
-madt->header.length = (unsigned char *)lapic - (unsigned char *)madt;
+x2apic = (void *)lapic;
+for ( ; i < hvminfo->nr_vcpus; i++ )
+{
+memset(x2apic, 0, sizeof(*x2apic));
+x2apic->type= ACPI_PROCESSOR_LOCAL_X2APIC;
+x2apic->length  = sizeof(*x2apic);
+/* Processor UID must match processor-object UIDs in the DSDT. */
+x2apic->acpi_process

[Xen-devel] [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast

2017-12-05 Thread Chao Gao

Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
has the following description:

"The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
Description Tables,” of the Advanced Configuration and Power Interface
Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
behavior for BIOS is to pass the control to the operating system with the
local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
than 255, and in x2APIC mode if there are any logical processor reporting an
APIC ID of 255 or greater."

In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
APs may compete for the stack, thus a lock is introduced to protect the stack.

Signed-off-by: Chao Gao 
---
v4:
 - new
---
 tools/firmware/hvmloader/apic_regs.h |  4 +++
 tools/firmware/hvmloader/smp.c   | 64 
 2 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/tools/firmware/hvmloader/apic_regs.h 
b/tools/firmware/hvmloader/apic_regs.h
index f737b47..bc39ecd 100644
--- a/tools/firmware/hvmloader/apic_regs.h
+++ b/tools/firmware/hvmloader/apic_regs.h
@@ -105,6 +105,10 @@
 #define APIC_TDR_DIV_64  0x9
 #define APIC_TDR_DIV_128 0xA
 
+#define MSR_IA32_APICBASE0x1b
+#define MSR_IA32_APICBASE_EXTD   (1<<10)
+#define MSR_IA32_APICBASE_MSR0x800
+
 #endif
 
 /*
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 082b17f..e3dade4 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -26,7 +26,9 @@
 #define AP_BOOT_EIP 0x1000
 extern char ap_boot_start[], ap_boot_end[];
 
-static int ap_callin, ap_cpuid;
+static int ap_callin;
+static int enable_x2apic;
+static bool lock = 1;
 
 asm (
 ".text   \n"
@@ -47,7 +49,15 @@ asm (
 "mov   %eax,%ds  \n"
 "mov   %eax,%es  \n"
 "mov   %eax,%ss  \n"
-"movl  $stack_top,%esp   \n"
+"3:  movb  $1, %bl   \n"
+"mov   $lock,%edx\n"
+"movzbl %bl,%eax \n"
+"xchg  %al, (%edx)   \n"
+"test  %al,%al   \n"
+"je2f\n"
+"pause   \n"
+"jmp   3b\n"
+"2:  movl  $stack_top,%esp   \n"
 "movl  %esp,%ebp \n"
 "call  ap_start  \n"
 "1:  hlt \n"
@@ -68,14 +78,34 @@ asm (
 ".text   \n"
 );
 
+unsigned int ap_cpuid(void)
+{
+if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
+{
+uint32_t eax, ebx, ecx, edx;
+
+cpuid(1, &eax, &ebx, &ecx, &edx);
+return ebx >> 24;
+}
+else
+return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
+}
+
 void ap_start(void); /* non-static avoids unused-function compiler warning */
 /*static*/ void ap_start(void)
 {
-printf(" - CPU%d ... ", ap_cpuid);
+printf(" - CPU%d ... ", ap_cpuid());
 cacheattr_init();
 printf("done.\n");
 wmb();
-ap_callin = 1;
+ap_callin++;
+
+if ( enable_x2apic )
+wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
+ MSR_IA32_APICBASE_EXTD);
+
+/* Release the lock */
+asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
 }
 
 static void lapic_wait_ready(void)
@@ -89,7 +119,6 @@ static void boot_cpu(unsigned int cpu)
 unsigned int icr2 = SET_APIC_DEST_FIELD(LAPIC_ID(cpu));
 
 /* Initialise shared variables. */
-ap_cpuid = cpu;
 ap_callin = 0;
 wmb();
 
@@ -118,6 +147,21 @@ static void boot_cpu(unsigned int cpu)
 lapic_wait_ready();
 }
 
+static void boot_cpu_broadcast_x2apic(unsigned int nr_cpus)
+{
+wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+  APIC_DEST_ALLBUT | APIC_DM_INIT);
+
+wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+  APIC_DEST_ALLBUT | APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
+
+while ( ap_callin != nr_cpus )
+cpu_relax();
+
+wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+  APIC_DEST_ALLBUT | APIC_DM_INIT);
+}
+
 void smp_initialise(void)
 {
 unsigned int i, nr_cpus = hvm_info->nr_vcpus;
@@ -125,9 +169,15 @@ void smp_initialise(void)
 memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
 
 printf(&qu

[Xen-devel] [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512

2017-12-05 Thread Chao Gao

Signed-off-by: Chao Gao 
---
 xen/include/public/hvm/hvm_info_table.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/public/hvm/hvm_info_table.h 
b/xen/include/public/hvm/hvm_info_table.h
index 08c252e..6833a4c 100644
--- a/xen/include/public/hvm/hvm_info_table.h
+++ b/xen/include/public/hvm/hvm_info_table.h
@@ -32,7 +32,7 @@
 #define HVM_INFO_PADDR   ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
 
 /* Maximum we can support with current vLAPIC ID mapping. */
-#define HVM_MAX_VCPUS128
+#define HVM_MAX_VCPUS512
 
 /*
  * In some cases SMP HVM guests may require knowledge of Xen's idea of vCPU ids
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus

2017-12-05 Thread Chao Gao

From: Lan Tianyu 

This patch is to change DSDT table for processor object to support 4096 vcpus
accroding to ACPI spec 8.4 Declaring Processors.

This patch contains the two changes:
1. Declare processors whose local APIC is declared as a x2apic via the ASL
   Device statement
2. Bump up the size of CPU ID used to compose processor name to 12 bits. Thus
   the processors number limitation imposed here is 4096.

Signed-off-by: Lan Tianyu 
Signed-off-by: Chao Gao 
---
 tools/libacpi/libacpi.h |  6 ++
 tools/libacpi/mk_dsdt.c | 40 +---
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index b89fdb5..7db4d92 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -24,6 +24,12 @@
 #include   /* uintXX_t */
 
 #define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
+/*
+ * For x86, APIC ID is twice the vcpu id. In MADT, only APICs with
+ * APIC ID <= 254 can be declared as local APIC. Otherwise, APICs with
+ * APIC ID > 254 should be declared as local x2APIC.
+ */
+#define MADT_MAX_LOCAL_APIC 128U
 
 #define ACPI_HAS_COM1  (1<<0)
 #define ACPI_HAS_COM2  (1<<1)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 2daf32c..27e5d1b 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -20,10 +20,13 @@
 #if defined(CONFIG_X86)
 #include 
 #include 
+#include "libacpi.h"
 #elif defined(CONFIG_ARM_64)
 #include 
 #endif
 
+#define CPU_NAME_FMT  "P%.03X"
+
 static unsigned int indent_level;
 static bool debug = false;
 
@@ -194,12 +197,35 @@ int main(int argc, char **argv)
 #endif
 
 /* Define processor objects and control methods. */
-for ( cpu = 0; cpu < max_cpus; cpu++)
+for ( cpu = 0; cpu < max_cpus; cpu++ )
 {
-push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
 
-stmt("Name", "_HID, \"ACPI0007\"");
+#ifdef CONFIG_X86
+/*
+ * According to the Processor Local x2APIC Structure of ACPI SPEC
+ * Revision 5.0, "OSPM associates the X2APIC Structure with a
+ * processor object declared in the namespace using the Device
+ * statement, when the _UID child object of the processor device
+ * evaluates to a numeric value, by matching the numeric value with
+ * this field".
+ *
+ * Anyhow, a numeric value is assigned to _UID object here. Thus,
+ * for each x2apic structure in MADT, instead of declaring the
+ * corresponding processor via the ASL Processor statement, declare
+ * it via the ASL Device statement.
+ *
+ * Note that If CPU ID is equal or greater than MADT_MAX_LOCAL_APIC,
+ * the lapic of this CPU should be enumerated as a local x2apic
+ * structure.
+ */
+if ( cpu >= MADT_MAX_LOCAL_APIC )
+push_block("Device", CPU_NAME_FMT, cpu);
+else
+#endif
+push_block("Processor", CPU_NAME_FMT ", %d,0xb010, 0x06",
+   cpu, cpu);
 
+stmt("Name", "_HID, \"ACPI0007\"");
 stmt("Name", "_UID, %d", cpu);
 #ifdef CONFIG_ARM_64
 pop_block();
@@ -268,15 +294,15 @@ int main(int argc, char **argv)
 /* Extract current CPU's status: 0=offline; 1=online. */
 stmt("And", "Local1, 1, Local2");
 /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
-push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+push_block("If", "LNotEqual(Local2, \\_SB." CPU_NAME_FMT ".FLG)", cpu);
 /* ...If not, update it and the MADT checksum, and notify OSPM. */
-stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
+stmt("Store", "Local2, \\_SB." CPU_NAME_FMT ".FLG", cpu);
 push_block("If", "LEqual(Local2, 1)");
-stmt("Notify", "PR%02X, 1", cpu); /* Notify: Device Check */
+stmt("Notify", CPU_NAME_FMT ", 1", cpu); /* Notify: Device Check */
 stmt("Subtract", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 push_block("Else", NULL);
-stmt("Notify", "PR%02X, 3", cpu); /* Notify: Eject Request */
+stmt("Notify", CPU_NAME_FMT ", 3", cpu); /* Notify: Eject Request */
 stmt("Add", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 pop_block();
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions

2017-12-06 Thread Chao Gao

On Wed, Dec 06, 2017 at 02:44:52PM +, Paul Durrant wrote:
>> -Original Message-
>> From: Chao Gao [mailto:chao@intel.com]
>> Sent: 06 December 2017 07:50
>> To: xen-de...@lists.xen.org
>> Cc: Chao Gao ; Andrew Cooper
>> ; Jan Beulich ; Paul
>> Durrant 
>> Subject: [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static
>> functions
>> 
>> It is a preparation to support multiple IOREQ pages.
>> No functional change.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> v4:
>>  -new
>> ---
>>  xen/arch/x86/hvm/ioreq.c | 48 +++--
>> ---
>>  1 file changed, 23 insertions(+), 25 deletions(-)
>> 
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index d991ac9..a879f20 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -237,10 +237,9 @@ static void hvm_free_ioreq_gfn(struct
>> hvm_ioreq_server *s, gfn_t gfn)
>>  set_bit(i, &d->arch.hvm_domain.ioreq_gfn.mask);
>>  }
>> 
>> -static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
>> +static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
>> +struct hvm_ioreq_page *iorp)
>>  {
>> -struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>> -
>
>I don't really like this approach. I'd prefer swapping the bool for an 
>unsigned page index, where we follow the convention adopted in 
>hvm_get_ioreq_server_frame() for which macros exist: 0 equating to the 
>bufioreq page, 1+ for the struct-per-cpu pages.

Ok. I have no preference for these two. But I will take your advice. 

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-06 Thread Chao Gao

On Wed, Dec 06, 2017 at 03:04:11PM +, Paul Durrant wrote:
>> -Original Message-
>> From: Chao Gao [mailto:chao@intel.com]
>> Sent: 06 December 2017 07:50
>> To: xen-de...@lists.xen.org
>> Cc: Chao Gao ; Paul Durrant
>> ; Tim (Xen.org) ; Stefano Stabellini
>> ; Konrad Rzeszutek Wilk
>> ; Jan Beulich ; George
>> Dunlap ; Andrew Cooper
>> ; Wei Liu ; Ian Jackson
>> 
>> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> pages
>> 
>> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
>> number constraint imposed by one IOREQ page, bump the number of IOREQ
>> page to
>> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> 
>> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
>> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
>> FOR_EACH_IOREQ_PAGE macro.
>> 
>> In order to access an IOREQ page, QEMU should get the gmfn and map this
>> gmfn
>> to its virtual address space.
>
>No. There's no need to extend the 'legacy' mechanism of using magic page gfns. 
>You should only handle the case where the mfns are allocated on demand (see 
>the call to hvm_ioreq_server_alloc_pages() in hvm_get_ioreq_server_frame()). 
>The number of guest vcpus is known at this point so the correct number of 
>pages can be allocated. If the creator of the ioreq server attempts to use the 
>legacy hvm_get_ioreq_server_info() and the guest has >128 vcpus then the call 
>should fail.

Great suggestion. I will introduce a new dmop, a variant of
hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
size of array. And the legacy interface will report an error if more
than one IOREQ PAGES are needed.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v14 04/11] x86/hvm/ioreq: defer mapping gfns until they are actually requsted

2017-12-06 Thread Chao Gao

On Tue, Nov 28, 2017 at 03:08:46PM +, Paul Durrant wrote:
>A subsequent patch will introduce a new scheme to allow an emulator to
>map ioreq server pages directly from Xen rather than the guest P2M.
>
>This patch lays the groundwork for that change by deferring mapping of
>gfns until their values are requested by an emulator. To that end, the
>pad field of the xen_dm_op_get_ioreq_server_info structure is re-purposed
>to a flags field and new flag, XEN_DMOP_no_gfns, defined which modifies the
>behaviour of XEN_DMOP_get_ioreq_server_info to allow the caller to avoid
>requesting the gfn values.
>
>Signed-off-by: Paul Durrant 
>Reviewed-by: Roger Pau Monné 
>Acked-by: Wei Liu 
>Reviewed-by: Jan Beulich 
>---
>Cc: Ian Jackson 
>Cc: Andrew Cooper 
>Cc: George Dunlap 
>Cc: Konrad Rzeszutek Wilk 
>Cc: Stefano Stabellini 
>Cc: Tim Deegan 
>
>v8:
> - For safety make all of the pointers passed to
>   hvm_get_ioreq_server_info() optional.
> - Shrink bufioreq_handling down to a uint8_t.
>
>v3:
> - Updated in response to review comments from Wei and Roger.
> - Added a HANDLE_BUFIOREQ macro to make the code neater.
> - This patch no longer introduces a security vulnerability since there
>   is now an explicit limit on the number of ioreq servers that may be
>   created for any one domain.
>---
> tools/libs/devicemodel/core.c   |  8 +
> tools/libs/devicemodel/include/xendevicemodel.h |  6 ++--
> xen/arch/x86/hvm/dm.c   |  9 +++--
> xen/arch/x86/hvm/ioreq.c| 47 ++---
> xen/include/asm-x86/hvm/domain.h|  2 +-
> xen/include/public/hvm/dm_op.h  | 32 ++---
> 6 files changed, 63 insertions(+), 41 deletions(-)
>
>diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
>index b66d4f9294..e684e657b6 100644
>--- a/tools/libs/devicemodel/core.c
>+++ b/tools/libs/devicemodel/core.c
>@@ -204,6 +204,14 @@ int xendevicemodel_get_ioreq_server_info(
> 
> data->id = id;
> 
>+/*
>+ * If the caller is not requesting gfn values then instruct the
>+ * hypercall not to retrieve them as this may cause them to be
>+ * mapped.
>+ */
>+if (!ioreq_gfn && !bufioreq_gfn)
>+data->flags |= XEN_DMOP_no_gfns;
>+
> rc = xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> if (rc)
> return rc;
>diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
>b/tools/libs/devicemodel/include/xendevicemodel.h
>index dda0bc7695..fffee3a4a0 100644
>--- a/tools/libs/devicemodel/include/xendevicemodel.h
>+++ b/tools/libs/devicemodel/include/xendevicemodel.h
>@@ -61,11 +61,11 @@ int xendevicemodel_create_ioreq_server(
>  * @parm domid the domain id to be serviced
>  * @parm id the IOREQ Server id.
>  * @parm ioreq_gfn pointer to a xen_pfn_t to receive the synchronous ioreq
>- *  gfn
>+ *  gfn. (May be NULL if not required)
>  * @parm bufioreq_gfn pointer to a xen_pfn_t to receive the buffered ioreq
>- *gfn
>+ *gfn. (May be NULL if not required)
>  * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered
>- * ioreq event channel
>+ * ioreq event channel. (May be NULL if not required)
>  * @return 0 on success, -1 on failure.
>  */
> int xendevicemodel_get_ioreq_server_info(
>diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
>index a787f43737..3c617bd754 100644
>--- a/xen/arch/x86/hvm/dm.c
>+++ b/xen/arch/x86/hvm/dm.c
>@@ -416,16 +416,19 @@ static int dm_op(const struct dmop_args *op_args)
> {
> struct xen_dm_op_get_ioreq_server_info *data =
> &op.u.get_ioreq_server_info;
>+const uint16_t valid_flags = XEN_DMOP_no_gfns;
> 
> const_op = false;
> 
> rc = -EINVAL;
>-if ( data->pad )
>+if ( data->flags & ~valid_flags )
> break;
> 
> rc = hvm_get_ioreq_server_info(d, data->id,
>-   &data->ioreq_gfn,
>-   &data->bufioreq_gfn,
>+   (data->flags & XEN_DMOP_no_gfns) ?
>+   NULL : &data->ioreq_gfn,
>+   (data->flags & XEN_DMOP_no_gfns) ?
>+   NULL : &data->bufioreq_gfn,
>&data->bufioreq_port);
> break;
> }
>diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>index eec4e4771e..39de659ddf 100644
>--- a/xen/arch/x86/hvm/ioreq.c
>+++ b/xen/arch/x86/hvm/ioreq.c
>@@ -350,6 +350,9 @@ static void hvm_update_ioreq_evtchn(struct 
>hvm_ioreq_server *s,
> }
> }
> 
>+#define HANDLE_BUFIOREQ(s) \
>+((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
>+
> static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
>  struct vcpu *v)
> {
>@@ -371

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-07 Thread Chao Gao

On Thu, Dec 07, 2017 at 08:41:14AM +, Paul Durrant wrote:
>> -Original Message-
>> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
>> Of Paul Durrant
>> Sent: 06 December 2017 16:10
>> To: 'Chao Gao' 
>> Cc: Stefano Stabellini ; Wei Liu
>> ; Andrew Cooper ; Tim
>> (Xen.org) ; George Dunlap ;
>> xen-de...@lists.xen.org; Jan Beulich ; Ian Jackson
>> 
>> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
>> IOREQ page to 4 pages
>> 
>> > -Original Message-
>> > From: Chao Gao [mailto:chao@intel.com]
>> > Sent: 06 December 2017 09:02
>> > To: Paul Durrant 
>> > Cc: xen-de...@lists.xen.org; Tim (Xen.org) ; Stefano
>> > Stabellini ; Konrad Rzeszutek Wilk
>> > ; Jan Beulich ; George
>> > Dunlap ; Andrew Cooper
>> > ; Wei Liu ; Ian Jackson
>> > 
>> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > pages
>> >
>> > On Wed, Dec 06, 2017 at 03:04:11PM +, Paul Durrant wrote:
>> > >> -Original Message-
>> > >> From: Chao Gao [mailto:chao@intel.com]
>> > >> Sent: 06 December 2017 07:50
>> > >> To: xen-de...@lists.xen.org
>> > >> Cc: Chao Gao ; Paul Durrant
>> > >> ; Tim (Xen.org) ; Stefano
>> > Stabellini
>> > >> ; Konrad Rzeszutek Wilk
>> > >> ; Jan Beulich ; George
>> > >> Dunlap ; Andrew Cooper
>> > >> ; Wei Liu ; Ian
>> > Jackson
>> > >> 
>> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > >> pages
>> > >>
>> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the
>> > vcpu
>> > >> number constraint imposed by one IOREQ page, bump the number of
>> > IOREQ
>> > >> page to
>> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> > >>
>> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server 
>> > >> to
>> an
>> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced 
>> > >> with
>> > >> FOR_EACH_IOREQ_PAGE macro.
>> > >>
>> > >> In order to access an IOREQ page, QEMU should get the gmfn and map
>> > this
>> > >> gmfn
>> > >> to its virtual address space.
>> > >
>> > >No. There's no need to extend the 'legacy' mechanism of using magic
>> page
>> > gfns. You should only handle the case where the mfns are allocated on
>> > demand (see the call to hvm_ioreq_server_alloc_pages() in
>> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known at
>> > this point so the correct number of pages can be allocated. If the creator 
>> > of
>> > the ioreq server attempts to use the legacy hvm_get_ioreq_server_info()
>> > and the guest has >128 vcpus then the call should fail.
>> >
>> > Great suggestion. I will introduce a new dmop, a variant of
>> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
>> > size of array. And the legacy interface will report an error if more
>> > than one IOREQ PAGES are needed.
>> 
>> You don't need a new dmop for mapping I think. The mem op to map ioreq
>> server frames should work. All you should need to do is update
>> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide some
>> means for the ioreq server creator to convert the number of guest vcpus into
>> the correct number of pages to map. (That might need a new dm op).
>
>I realise after saying this that an emulator already knows the size of the 
>ioreq structure and so can easily calculate the correct number of pages to 
>map, given the number of guest vcpus.

How about the patch in the bottom? Is it in the right direction?
Do you have the QEMU patch, which replaces the old method with the new method
to set up mapping? I want to integrate that patch and do some tests.

Thanks
Chao

From 44919e1e80f36981d6e213f74302c8c89cc9f828 Mon Sep 17 00:00:00 2001
From: Chao Gao 
Date: Tue, 5 Dec 2017 14:20:24 +0800
Subject: [PATCH] ioreq: add support of multiple ioreq pages

Each vcpu should have an corresponding 'ioreq_t' structure in the ioreq page.
Currently, only one 4K-byte page is used as ioreq page. Thus it also limits
the number of vcpu to 12

Re: [Xen-devel] [PATCH v3 2/3] xen/pt: Pass the whole msi addr/data to Xen

2017-12-11 Thread Chao Gao

On Mon, Dec 11, 2017 at 05:59:08PM +, Anthony PERARD wrote:
>On Fri, Nov 17, 2017 at 02:24:24PM +0800, Chao Gao wrote:
>> Previously, some fields (reserved or unalterable) are filtered by
>> Qemu. This fields are useless for the legacy interrupt format.
>> However, these fields are may meaningful (for intel platform)
>> for the interrupt of remapping format. It is better to pass the whole
>> msi addr/data to Xen without any filtering.
>> 
>> The main reason why we want this is QEMU doesn't have the knowledge
>> to decide the interrupt format after we introduce vIOMMU inside Xen.
>> Passing the whole msi message down and let arch-specific vIOMMU to
>> decide the interrupt format.
>> 
>> Signed-off-by: Chao Gao 
>> Signed-off-by: Lan Tianyu 
>> ---
>> v3:
>>  - new
>> ---
>>  hw/xen/xen_pt_msi.c | 47 ---
>>  1 file changed, 12 insertions(+), 35 deletions(-)
>> 
>> diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
>> index 6d1e3bd..f7d6e76 100644
>> --- a/hw/xen/xen_pt_msi.c
>> +++ b/hw/xen/xen_pt_msi.c
>> @@ -47,25 +47,6 @@ static inline uint32_t msi_ext_dest_id(uint32_t addr_hi)
>>  return addr_hi & 0xff00;
>>  }
>>  
>> -static uint32_t msi_gflags(uint32_t data, uint64_t addr)
>> -{
>> -uint32_t result = 0;
>> -int rh, dm, dest_id, deliv_mode, trig_mode;
>> -
>> -rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
>> -dm = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
>> -dest_id = msi_dest_id(addr);
>> -deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
>> -trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> -
>> -result = dest_id | (rh << XEN_PT_GFLAGS_SHIFT_RH)
>> -| (dm << XEN_PT_GFLAGS_SHIFT_DM)
>> -| (deliv_mode << XEN_PT_GFLAGSSHIFT_DELIV_MODE)
>> -| (trig_mode << XEN_PT_GFLAGSSHIFT_TRG_MODE);
>> -
>> -return result;
>> -}
>> -
>>  static inline uint64_t msi_addr64(XenPTMSI *msi)
>>  {
>>  return (uint64_t)msi->addr_hi << 32 | msi->addr_lo;
>> @@ -160,23 +141,20 @@ static int msi_msix_update(XenPCIPassthroughState *s,
>> bool masked)
>>  {
>>  PCIDevice *d = &s->dev;
>> -uint8_t gvec = msi_vector(data);
>> -uint32_t gflags = msi_gflags(data, addr);
>> +uint32_t gflags = masked ? 0 : (1u << XEN_PT_GFLAGSSHIFT_UNMASKED);
>>  int rc = 0;
>>  uint64_t table_addr = 0;
>>  
>> -XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
>> -   " (entry: %#x)\n",
>> -   is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
>> +XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x addr %"PRIx64
>> +   " data %#x gflags %#x (entry: %#x)\n",
>> +   is_msix ? "-X" : "", pirq, addr, data, gflags, msix_entry);
>>  
>>  if (is_msix) {
>>  table_addr = s->msix->mmio_base_addr;
>>  }
>>  
>> -gflags |= masked ? 0 : (1u << XEN_PT_GFLAGSSHIFT_UNMASKED);
>> -
>> -rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
>> -  pirq, gflags, table_addr);
>> +rc = xc_domain_update_msi_irq(xen_xc, xen_domid, pirq, addr,
>> +  data, gflags, table_addr);
>
>Are you trying to modifie an existing API? That is not going to work. We
>want to be able to build QEMU against older version of Xen, and it
>should work as well.

Yes. I thought it didn't matter. And definitely, I was wrong. I will keep
compatibility by introducing a new API. A wapper function, which calls
the old or new API according to the Xen version, would be used here.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] msi: Handle remappable format interrupt request

2017-12-11 Thread Chao Gao

On Mon, Dec 11, 2017 at 06:07:48PM +, Anthony PERARD wrote:
>On Fri, Nov 17, 2017 at 02:24:25PM +0800, Chao Gao wrote:
>> According to VT-d spec Interrupt Remapping and Interrupt Posting ->
>> Interrupt Remapping -> Interrupt Request Formats On Intel 64
>> Platforms, fields of MSI data register have changed. This patch
>> avoids wrongly regarding a remappable format interrupt request as
>> an interrupt binded with a pirq.
>> 
>> Signed-off-by: Chao Gao 
>> Signed-off-by: Lan Tianyu 
>> ---
>> v3:
>>  - clarify the interrupt format bit is Intel-specific, then it is
>>  improper to define MSI_ADDR_IF_MASK in a common header.
>> ---
>>  hw/i386/xen/xen-hvm.c | 10 +-
>>  hw/pci/msi.c  |  5 +++--
>>  hw/pci/msix.c |  4 +++-
>>  hw/xen/xen_pt_msi.c   |  2 +-
>>  include/hw/xen/xen.h  |  2 +-
>>  stubs/xen-hvm.c   |  2 +-
>>  6 files changed, 18 insertions(+), 7 deletions(-)
>> 
>> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
>> index 8028bed..52dc8af 100644
>> --- a/hw/i386/xen/xen-hvm.c
>> +++ b/hw/i386/xen/xen-hvm.c
>> @@ -145,8 +145,16 @@ void xen_piix_pci_write_config_client(uint32_t address, 
>> uint32_t val, int len)
>>  }
>>  }
>>  
>> -int xen_is_pirq_msi(uint32_t msi_data)
>> +int xen_is_pirq_msi(uint32_t msi_addr_lo, uint32_t msi_data)
>>  {
>> +/* If the MSI address is configured in remapping format, the MSI will 
>> not
>> + * be remapped into a pirq. This 'if' test excludes Intel-specific
>> + * remappable msi.
>> + */
>> +#define MSI_ADDR_IF_MASK 0x0010
>
>I don't think that is the right place for a define, they also exist
>outside of the context of the function.

yes.

>That define would be better at the top of this file, I think.(There is

will do.

Thanks
Chao

>probably a better place in the common headers, but I'm not sure were.)


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-12 Thread Chao Gao

On Fri, Dec 08, 2017 at 11:06:43AM +, Paul Durrant wrote:
>> -Original Message-
>> From: Chao Gao [mailto:chao@intel.com]
>> Sent: 07 December 2017 06:57
>> To: Paul Durrant 
>> Cc: Stefano Stabellini ; Wei Liu
>> ; Andrew Cooper ; Tim
>> (Xen.org) ; George Dunlap ;
>> xen-de...@lists.xen.org; Jan Beulich ; Ian Jackson
>> 
>> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> pages
>> 
>> On Thu, Dec 07, 2017 at 08:41:14AM +, Paul Durrant wrote:
>> >> -Original Message-
>> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
>> Behalf
>> >> Of Paul Durrant
>> >> Sent: 06 December 2017 16:10
>> >> To: 'Chao Gao' 
>> >> Cc: Stefano Stabellini ; Wei Liu
>> >> ; Andrew Cooper ;
>> Tim
>> >> (Xen.org) ; George Dunlap ;
>> >> xen-de...@lists.xen.org; Jan Beulich ; Ian Jackson
>> >> 
>> >> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
>> >> IOREQ page to 4 pages
>> >>
>> >> > -Original Message-
>> >> > From: Chao Gao [mailto:chao@intel.com]
>> >> > Sent: 06 December 2017 09:02
>> >> > To: Paul Durrant 
>> >> > Cc: xen-de...@lists.xen.org; Tim (Xen.org) ; Stefano
>> >> > Stabellini ; Konrad Rzeszutek Wilk
>> >> > ; Jan Beulich ; George
>> >> > Dunlap ; Andrew Cooper
>> >> > ; Wei Liu ; Ian
>> Jackson
>> >> > 
>> >> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
>> to 4
>> >> > pages
>> >> >
>> >> > On Wed, Dec 06, 2017 at 03:04:11PM +, Paul Durrant wrote:
>> >> > >> -Original Message-
>> >> > >> From: Chao Gao [mailto:chao@intel.com]
>> >> > >> Sent: 06 December 2017 07:50
>> >> > >> To: xen-de...@lists.xen.org
>> >> > >> Cc: Chao Gao ; Paul Durrant
>> >> > >> ; Tim (Xen.org) ; Stefano
>> >> > Stabellini
>> >> > >> ; Konrad Rzeszutek Wilk
>> >> > >> ; Jan Beulich ;
>> George
>> >> > >> Dunlap ; Andrew Cooper
>> >> > >> ; Wei Liu ; Ian
>> >> > Jackson
>> >> > >> 
>> >> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
>> to 4
>> >> > >> pages
>> >> > >>
>> >> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove
>> the
>> >> > vcpu
>> >> > >> number constraint imposed by one IOREQ page, bump the number
>> of
>> >> > IOREQ
>> >> > >> page to
>> >> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> >> > >>
>> >> > >> Basically, this patch extends 'ioreq' field in struct 
>> >> > >> hvm_ioreq_server
>> to
>> >> an
>> >> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced 
>> >> > >> with
>> >> > >> FOR_EACH_IOREQ_PAGE macro.
>> >> > >>
>> >> > >> In order to access an IOREQ page, QEMU should get the gmfn and
>> map
>> >> > this
>> >> > >> gmfn
>> >> > >> to its virtual address space.
>> >> > >
>> >> > >No. There's no need to extend the 'legacy' mechanism of using magic
>> >> page
>> >> > gfns. You should only handle the case where the mfns are allocated on
>> >> > demand (see the call to hvm_ioreq_server_alloc_pages() in
>> >> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known
>> at
>> >> > this point so the correct number of pages can be allocated. If the 
>> >> > creator
>> of
>> >> > the ioreq server attempts to use the legacy
>> hvm_get_ioreq_server_info()
>> >> > and the guest has >128 vcpus then the call should fail.
>> >> >
>> >> > Great suggestion. I will introduce a new dmop, a variant of
>> >> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and
>> the
>> >> > size of array. And the leg

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-12 Thread Chao Gao

On Tue, Dec 12, 2017 at 09:07:46AM +, Paul Durrant wrote:
>> -Original Message-
>[snip]
>> 
>> Hi, Paul.
>> 
>> I merged the two qemu patches, the privcmd patch [1] and did some tests.
>> I encountered a small issue and report it to you, so you can pay more
>> attention to it when doing some tests. The symptom is that using the new
>> interface to map grant table in xc_dom_gnttab_seed() always fails. After
>> adding some printk in privcmd, I found it is
>> xen_remap_domain_gfn_array() that fails with errcode -16. Mapping ioreq
>> server doesn't have such an issue.
>> 
>> [1]
>> http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce5
>> 9a05e6712
>> 
>
>Chao,
>
>  That privcmd patch is out of date. I've just pushed a new one:
>
>http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=9f00199f5f12cef401c6370c94a1140de9b318fc
>
>  Give that a try. I've been using it for a few weeks now.

Mapping ioreq server always fails, while mapping grant table succeeds.

QEMU fails with following log:
xenforeignmemory: error: ioctl failed: Device or resource busy
qemu-system-i386: failed to map ioreq server resources: error 16
handle=0x5614a6df5e00
qemu-system-i386: xen hardware virtual machine initialisation failed

Xen encountered the following error:
(XEN) [13118.909787] mm.c:1003:d0v109 pg_owner d2 l1e_owner d0, but 
real_pg_owner d0
(XEN) [13118.918122] mm.c:1079:d0v109 Error getting mfn 5da5841 (pfn 
) from L1 entry 805da5841227 for l1e_owner d0, pg_owner d2

I only fixed some obvious issues with a patch to your privcmd patch:
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -181,7 +181,7 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
if (xen_feature(XENFEAT_auto_translated_physmap))
return -EOPNOTSUPP;
 
-   return do_remap_gfn(vma, addr, &gfn, nr, NULL, prot, domid, pages);
+   return do_remap_pfn(vma, addr, &gfn, nr, NULL, prot, domid, false, pages
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -200,8 +200,8 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
 * cause of "wrong memory was mapped in".
 */
BUG_ON(err_ptr == NULL);
-do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
-false, pages);
+   return do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
+   false, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2017-12-14 Thread Chao Gao

On Thu, Dec 14, 2017 at 02:50:17PM +, Paul Durrant wrote:
>> -Original Message-
>> >
>> > Hmm. That looks like it is because the ioreq server pages are not owned by
>> > the correct domain. The Xen patch series underwent some changes later in
>> > review and I did not re-test my QEMU patch after that so I wonder if
>> > mapping IOREQ pages has simply become broken. I'll investigate.
>> >
>> 
>> I have reproduced the problem locally now. Will try to figure out the bug
>> tomorrow.
>> 
>
>Chao,
>
>  Can you try my new branch 
> http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/heads/ioreq24?
>
>  The problem was indeed that the ioreq pages were owned by the emulating 
> domain rather than the target domain, which is no longer compatible with 
> privcmd's use of HYPERVISOR_mmu_update.

Of course. I tested this branch. It works well.

But, I think your privcmd patch couldn't set 'err_ptr' to NULL when
calling xen_remap_domain_mfn_array(). It works for the ioreq page is
allocated right before the bufioreq page, and then they happen to be
continuous.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v15 04/11] x86/hvm/ioreq: defer mapping gfns until they are actually requsted

2017-12-14 Thread Chao Gao

On Thu, Dec 14, 2017 at 05:41:37PM +, Paul Durrant wrote:
>A subsequent patch will introduce a new scheme to allow an emulator to
>map ioreq server pages directly from Xen rather than the guest P2M.
>
>This patch lays the groundwork for that change by deferring mapping of
>gfns until their values are requested by an emulator. To that end, the
>pad field of the xen_dm_op_get_ioreq_server_info structure is re-purposed
>to a flags field and new flag, XEN_DMOP_no_gfns, defined which modifies the
>behaviour of XEN_DMOP_get_ioreq_server_info to allow the caller to avoid
>requesting the gfn values.
>
>Signed-off-by: Paul Durrant 
>Reviewed-by: Roger Pau Monné 
>Acked-by: Wei Liu 
>Reviewed-by: Jan Beulich 
>---
>Cc: Ian Jackson 
>Cc: Andrew Cooper 
>Cc: George Dunlap 
>Cc: Konrad Rzeszutek Wilk 
>Cc: Stefano Stabellini 
>Cc: Tim Deegan 
>
>v8:
> - For safety make all of the pointers passed to
>   hvm_get_ioreq_server_info() optional.
> - Shrink bufioreq_handling down to a uint8_t.
>
>v3:
> - Updated in response to review comments from Wei and Roger.
> - Added a HANDLE_BUFIOREQ macro to make the code neater.
> - This patch no longer introduces a security vulnerability since there
>   is now an explicit limit on the number of ioreq servers that may be
>   created for any one domain.
>---
> tools/libs/devicemodel/core.c   |  8 +
> tools/libs/devicemodel/include/xendevicemodel.h |  6 ++--
> xen/arch/x86/hvm/dm.c   |  9 +++--
> xen/arch/x86/hvm/ioreq.c| 47 ++---
> xen/include/asm-x86/hvm/domain.h|  2 +-
> xen/include/public/hvm/dm_op.h  | 32 ++---
> 6 files changed, 63 insertions(+), 41 deletions(-)
>
>diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
>index 355b7dec18..df2a8a0fe7 100644
>--- a/tools/libs/devicemodel/core.c
>+++ b/tools/libs/devicemodel/core.c
>@@ -204,6 +204,14 @@ int xendevicemodel_get_ioreq_server_info(
> 
> data->id = id;
> 
>+/*
>+ * If the caller is not requesting gfn values then instruct the
>+ * hypercall not to retrieve them as this may cause them to be
>+ * mapped.
>+ */
>+if (!ioreq_gfn && !bufioreq_gfn)
>+data->flags |= XEN_DMOP_no_gfns;
>+
> rc = xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> if (rc)
> return rc;
>diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
>b/tools/libs/devicemodel/include/xendevicemodel.h
>index dda0bc7695..fffee3a4a0 100644
>--- a/tools/libs/devicemodel/include/xendevicemodel.h
>+++ b/tools/libs/devicemodel/include/xendevicemodel.h
>@@ -61,11 +61,11 @@ int xendevicemodel_create_ioreq_server(
>  * @parm domid the domain id to be serviced
>  * @parm id the IOREQ Server id.
>  * @parm ioreq_gfn pointer to a xen_pfn_t to receive the synchronous ioreq
>- *  gfn
>+ *  gfn. (May be NULL if not required)
>  * @parm bufioreq_gfn pointer to a xen_pfn_t to receive the buffered ioreq
>- *gfn
>+ *gfn. (May be NULL if not required)
>  * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered
>- * ioreq event channel
>+ * ioreq event channel. (May be NULL if not required)
>  * @return 0 on success, -1 on failure.
>  */
> int xendevicemodel_get_ioreq_server_info(
>diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
>index a787f43737..3c617bd754 100644
>--- a/xen/arch/x86/hvm/dm.c
>+++ b/xen/arch/x86/hvm/dm.c
>@@ -416,16 +416,19 @@ static int dm_op(const struct dmop_args *op_args)
> {
> struct xen_dm_op_get_ioreq_server_info *data =
> &op.u.get_ioreq_server_info;
>+const uint16_t valid_flags = XEN_DMOP_no_gfns;
> 
> const_op = false;
> 
> rc = -EINVAL;
>-if ( data->pad )
>+if ( data->flags & ~valid_flags )
> break;
> 
> rc = hvm_get_ioreq_server_info(d, data->id,
>-   &data->ioreq_gfn,
>-   &data->bufioreq_gfn,
>+   (data->flags & XEN_DMOP_no_gfns) ?
>+   NULL : &data->ioreq_gfn,
>+   (data->flags & XEN_DMOP_no_gfns) ?
>+   NULL : &data->bufioreq_gfn,
>&data->bufioreq_port);
> break;
> }
>diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>index f913ed31fa..284eefeac5 100644
>--- a/xen/arch/x86/hvm/ioreq.c
>+++ b/xen/arch/x86/hvm/ioreq.c
>@@ -350,6 +350,9 @@ static void hvm_update_ioreq_evtchn(struct 
>hvm_ioreq_server *s,
> }
> }
> 
>+#define HANDLE_BUFIOREQ(s) \
>+((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
>+
> static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
>  struct vcpu *v)
> {
>@@ -371

Re: [Xen-devel] [RFC Patch] xen/pt: Emulate FLR capability

2019-09-06 Thread Chao Gao

On Thu, Aug 29, 2019 at 12:21:11PM +0200, Roger Pau Monné wrote:
>On Thu, Aug 29, 2019 at 05:02:27PM +0800, Chao Gao wrote:
>> Currently, for a HVM on Xen, no reset method is virtualized. So in a VM's
>> perspective, assigned devices cannot be reset. But some devices rely on PCI
>> reset to recover from hardware hangs. When being assigned to a VM, those
>> devices cannot be reset and won't work any longer if a hardware hang occurs.
>> We have to reboot VM to trigger PCI reset on host to recover the device.
>>
>> This patch exposes FLR capability to VMs if the assigned device can be reset 
>> on
>> host. When VM initiates an FLR to a device, qemu cleans up the device state,
>> (including disabling of intx and/or MSI and unmapping BARs from guest, 
>> deleting
>> emulated registers), then initiate PCI reset through 'reset' knob under the
>> device's sysfs, finally initialize the device again.
>
>I think you likely need to deassign the device from the VM, perform
>the reset, and then assign the device again, so that there's no Xen
>internal state carried over prior to the reset?

Yes. It is the safest way. But here I want to present the feature as FLR
(such that the device driver in guest can issue PCI reset whenever
needed and no change is needed to device driver).  Current device
deassignment notifies guest that the device is going to be removed
It is not the standard PCI reset. Is it possible to make guest unaware
of the device deassignment to emulate a standard PCI reset? In my mind,
we can expose do_pci_remove/add to qemu or rewrite them in qemu (but
don't remove the device from guest's PCI hierarchy). Do you think it is
the right direction?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-09-08 Thread Chao Gao

On Fri, Aug 30, 2019 at 02:35:06PM +0800, Chao Gao wrote:
>On Thu, Aug 29, 2019 at 02:11:10PM +0200, Jan Beulich wrote:
>>On 27.08.2019 06:52, Chao Gao wrote:
>>> On Mon, Aug 26, 2019 at 04:07:59PM +0800, Chao Gao wrote:
>>>> On Fri, Aug 23, 2019 at 09:46:37AM +0100, Sergey Dyasli wrote:
>>>>> On 19/08/2019 02:25, Chao Gao wrote:
>>>>>> register an nmi callback. And this callback does busy-loop on threads
>>>>>> which are waiting for loading completion. Control threads send NMI to
>>>>>> slave threads to prevent NMI acceptance during ucode loading.
>>>>>>
>>>>>> Signed-off-by: Chao Gao 
>>>>>> ---
>>>>>> Changes in v9:
>>>>>>  - control threads send NMI to all other threads. Slave threads will
>>>>>>  stay in the NMI handling to prevent NMI acceptance during ucode
>>>>>>  loading. Note that self-nmi is invalid according to SDM.
>>>>>
>>>>> To me this looks like a half-measure: why keep only slave threads in
>>>>> the NMI handler, when master threads can update the microcode from
>>>>> inside the NMI handler as well?
>>>>
>>>> No special reason. Because the issue we want to address is that slave
>>>> threads might go to handle NMI and access MSRs when master thread is
>>>> loading ucode. So we only keep slave threads in the NMI handler.
>>>>
>>>>>
>>>>> You mention that self-nmi is invalid, but Xen has self_nmi() which is
>>>>> used for apply_alternatives() during boot, so can be trusted to work.
>>>>
>>>> Sorry, I meant using self shorthand to send self-nmi. I tried to use
>>>> self shorthand but got APIC error. And I agree that it is better to
>>>> make slave thread call self_nmi() itself.
>>>>
>>>>>
>>>>> I experimented a bit with the following approach: after loading_state
>>>>> becomes LOADING_CALLIN, each cpu issues a self_nmi() and rendezvous
>>>>> via cpu_callin_map into LOADING_ENTER to do a ucode update directly in
>>>>> the NMI handler. And it seems to work.
>>>>>
>>>>> Separate question is about the safety of this approach: can we be sure
>>>>> that a ucode update would not reset the status of the NMI latch? I.e.
>>>>> can it cause another NMI to be delivered while Xen already handles one?
>>>>
>>>> Ashok, what's your opinion on Sergey's approach and his concern?
>>> 
>>> I talked with Ashok. We think your approach is better. I will follow
>>> your approach in v10. It would be much helpful if you post your patch
>>> so that I can just rebase it onto other patches.
>>
>>Doing the actual ucode update inside an NMI handler seems rather risky
>>to me. Even if Ashok confirmed it would not be an issue on past and
>>current Intel CPUs - what about future ones, or ones from other vendors?
>

Intel SDM doesn't say that loading ucode isn't allowed inside an NMI
handler. So it is allowed implicitly. If future CPUs cannot load ucode
in NMI handler, SDM should document it and at that time, we can move
ucode loading out of NMI handler for new CPUS. As to AMD, if someone
objects to this approach, let's use this approach for Intel only.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen 4.13 Development Update

2019-09-11 Thread Chao Gao

On Fri, Sep 06, 2019 at 09:40:58AM +0200, Juergen Gross wrote:
>This email only tracks big items for xen.git tree. Please reply for items you
>would like to see in 4.13 so that people have an idea what is going on and
>prioritise accordingly.
>
>=== x86 === 
>
>*  HVM guest CPU topology support (RFC)
>  -  Chao Gao

No plan to continue this one due to some reason. Please drop this one.

>
>*  Improve late microcode loading (v9)
>  -  Chao Gao
>

Working on the v10. I would like to get it merged in 4.13.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 12/16] x86/microcode: Synchronize late microcode loading

2019-09-12 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v10:
 - introduce wait_for_state() and set_state() helper functions
 - make wait_for_condition() return bool and take const void *
 - disable/enable watchdog in control thread
 - rename "master" and "slave" thread to "primary" and "secondary"

Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
 ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
 there are two places for a cpu to call in. bitmap won't be counted
 twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
 update. We have to do this because some threads may be stuck in NMI
 handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 296 ++-
 1 file changed, 269 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index c2ea20f..049eda6 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+LOADING_PREPARE,
+LOADING_CALLIN,
+LOADING_ENTER,
+LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the highest revision number. Return error If no
  * patch is found and an error occurs during the parsing process. Otherwise
@@ -231,6 +275,34 @@ static bool microcod

[Xen-devel] [PATCH v10 06/16] microcode: remove pointless 'cpu' parameter

2019-09-12 Thread Chao Gao

Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - use a convenience variable 'cpu' in collect_cpu_info() on AMD side
 - rebase and fix conflicts

Changes in v8:
 - Use current_cpu_data in collect_cpu_info()
 - keep the cpu parameter of check_final_patch_levels()
 - use smp_processor_id() in get_matching_microcode() rather than
 define a local variable and label it "__maybe_unused"
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 20 
 xen/arch/x86/microcode_amd.c| 34 +-
 xen/arch/x86/microcode_intel.c  | 41 +++--
 xen/arch/x86/smpboot.c  |  2 +-
 xen/include/asm-x86/microcode.h |  7 +++
 xen/include/asm-x86/processor.h |  2 +-
 7 files changed, 42 insertions(+), 66 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index e3954ee..269b140 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu(0);
+microcode_resume_cpu();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index d17dbec..89a8d2b 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -196,19 +196,19 @@ struct microcode_info {
 char buffer[1];
 };
 
-int microcode_resume_cpu(unsigned int cpu)
+int microcode_resume_cpu(void)
 {
 int err;
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
+struct cpu_signature *sig = &this_cpu(cpu_sig);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode(cpu);
+err = microcode_ops->apply_microcode();
 spin_unlock(µcode_mutex);
 
 return err;
@@ -257,9 +257,9 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(cpu, buf, size);
+err = microcode_ops->cpu_request_microcode(buf, size);
 spin_unlock(µcode_mutex);
 
 return err;
@@ -348,8 +348,6 @@ __initcall(microcode_init);
 
 int __init early_microcode_update_cpu(bool start_update)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -368,7 +366,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(&ucode_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
 if ( data )
 {
@@ -386,8 +384,6 @@ int __init early_microcode_update_cpu(bool start_update)
 
 int __init early_microcode_init(void)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -400,7 +396,7 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
 rc = early_microcode_update_cpu(true);
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 69c9cfe..1d27c71 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -78,8 +78,9 @@ struct mpbhdr {
 static DEFINE_SPINLOCK(microcode_update_lock);
 
 /* See comment in start_update() for cases when this routine fails */
-static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
+static int collect_cpu_info(struct cpu_signature *csig)
 {
+unsigned int cpu = smp_processor_id();
 struct cpuinfo_x86 *c = &cpu_data[cpu];
 
 memset(csig, 0, sizeof(*csig));
@@ -153,17 +154,15 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 }
 
 static enum microcode_match_result microcode_fits(
-const struct microcode_amd *mc_amd, unsigned int cpu)
+const struct microcode_amd *mc_amd)
 {
+unsigned int cpu = smp_processor_id();
 const struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
 const str

[Xen-devel] [PATCH v10 02/16] microcode/amd: distinguish old and mismatched ucode in microcode_fits()

2019-09-12 Thread Chao Gao

Sometimes, an ucode with a level lower than or equal to current CPU's
patch level is useful. For example, to work around a broken bios which
only loads ucode for BSP, when BSP parses an ucode blob during bootup,
it is better to save an ucode with lower or equal level for APs

No functional change is made in this patch. But following patch would
handle "old ucode" and "mismatched ucode" separately.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v8:
 - new
---
 xen/arch/x86/microcode_amd.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 9b74330..7fa700b 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -152,8 +152,8 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 return 0;
 }
 
-static bool_t microcode_fits(const struct microcode_amd *mc_amd,
- unsigned int cpu)
+static enum microcode_match_result microcode_fits(
+const struct microcode_amd *mc_amd, unsigned int cpu)
 {
 struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
@@ -167,27 +167,27 @@ static bool_t microcode_fits(const struct microcode_amd 
*mc_amd,
 current_cpu_id = cpuid_eax(0x0001);
 
 if ( !find_equiv_cpu_id(equiv_cpu_table, current_cpu_id, &equiv_cpu_id) )
-return 0;
+return MIS_UCODE;
 
 if ( (mc_header->processor_rev_id) != equiv_cpu_id )
-return 0;
+return MIS_UCODE;
 
 if ( !verify_patch_size(mc_amd->mpb_size) )
 {
 pr_debug("microcode: patch size mismatch\n");
-return 0;
+return MIS_UCODE;
 }
 
 if ( mc_header->patch_id <= uci->cpu_sig.rev )
 {
 pr_debug("microcode: patch is already at required level or 
greater.\n");
-return 0;
+return OLD_UCODE;
 }
 
 pr_debug("microcode: CPU%d found a matching microcode update with version 
%#x (current=%#x)\n",
  cpu, mc_header->patch_id, uci->cpu_sig.rev);
 
-return 1;
+return NEW_UCODE;
 }
 
 static int apply_microcode(unsigned int cpu)
@@ -496,7 +496,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
&offset)) == 0 )
 {
-if ( microcode_fits(mc_amd, cpu) )
+if ( microcode_fits(mc_amd, cpu) == NEW_UCODE )
 {
 error = apply_microcode(cpu);
 if ( error )
@@ -579,7 +579,7 @@ static int microcode_resume_match(unsigned int cpu, const 
void *mc)
 struct microcode_amd *mc_amd = uci->mc.mc_amd;
 const struct microcode_amd *src = mc;
 
-if ( !microcode_fits(src, cpu) )
+if ( microcode_fits(src, cpu) != NEW_UCODE )
 return 0;
 
 if ( src != mc_amd )
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 03/16] microcode: introduce a global cache of ucode patch

2019-09-12 Thread Chao Gao

to replace the current per-cpu cache 'uci->mc'.

With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. Having differing microcode revisions
on cpus would cause system unstable and should be avoided. Hence, caching
one microcode update is good enough for all cases.

Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And
this global cache is loaded to cpu in apply_microcode().

All operations on the cache is protected by 'microcode_mutex'.

Note that I deliberately avoid touching the old per-cpu cache ('uci->mc')
as I am going to remove it completely in the following patches. We copy
everything to create the new cache blob to avoid reusing some buffers
previously allocated for the old per-cpu cache. It is not so efficient,
but it is already corrected by a patch later in this series.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - assert mismatched ucode won't be passed to ->compare_patch.
 - return -ENOENT if patch is NULL in .apply_microcode().
 - check against NULL pointer dereference in free_patch() on AMD side
 - cosmetic changes suggested by Roger and Jan.

Changes in v9:
 - on Intel side, ->compare_patch just checks the patch revision number.
 - explain why all buffers are copied in alloc_microcode_patch() in
 patch description.

Changes in v8:
 - Free generic wrapper struct in general code
 - Try to update cache as long as a patch covers current cpu. Previsouly,
 cache is updated only if the patch is newer than current update revision in
 the CPU. The small difference can work around a broken bios which only
 applies microcode update to BSP and software has to apply the same
 update to other CPUs.

Changes in v7:
 - reworked to cache only one microcode patch rather than a list of
 microcode patches.
---
 xen/arch/x86/microcode.c| 38 
 xen/arch/x86/microcode_amd.c| 98 ++---
 xen/arch/x86/microcode_intel.c  | 81 +++---
 xen/include/asm-x86/microcode.h | 16 +++
 4 files changed, 211 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 421d57e..e218a9d 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -61,6 +61,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* Protected by microcode_mutex */
+static struct microcode_patch *microcode_cache;
+
 void __init microcode_set_module(unsigned int idx)
 {
 ucode_mod_idx = idx;
@@ -262,6 +265,41 @@ int microcode_resume_cpu(unsigned int cpu)
 return err;
 }
 
+void microcode_free_patch(struct microcode_patch *microcode_patch)
+{
+microcode_ops->free_patch(microcode_patch->mc);
+xfree(microcode_patch);
+}
+
+const struct microcode_patch *microcode_get_cache(void)
+{
+ASSERT(spin_is_locked(µcode_mutex));
+
+return microcode_cache;
+}
+
+/* Return true if cache gets updated. Otherwise, return false */
+bool microcode_update_cache(struct microcode_patch *patch)
+{
+ASSERT(spin_is_locked(µcode_mutex));
+
+if ( !microcode_cache )
+microcode_cache = patch;
+else if ( microcode_ops->compare_patch(patch,
+   microcode_cache) == NEW_UCODE )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = patch;
+}
+else
+{
+microcode_free_patch(patch);
+return false;
+}
+
+return true;
+}
+
 static int microcode_update_cpu(const void *buf, size_t size)
 {
 int err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 7fa700b..2dca1df 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -190,25 +190,92 @@ static enum microcode_match_result microcode_fits(
 return NEW_UCODE;
 }
 
+static bool match_cpu(const struct microcode_patch *patch)
+{
+if ( !patch )
+return false;
+return microcode_fits(patch->mc_amd, smp_processor_id()) == NEW_UCODE;
+}
+
+static struct microcode_patch *alloc_microcode_patch(
+const struct microcode_amd *mc_amd)
+{
+struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
+struct microcode_amd *cache = xmalloc(struct microcode_amd);
+void *mpb = xmalloc_bytes(mc_amd->mpb_size);
+struct equiv_cpu_entry *equiv_cpu_table =
+xmalloc_bytes(mc_amd->equiv_cpu_table_size);
+
+if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
+{
+xfree(microcode_patch);
+xfree(cache);

[Xen-devel] [PATCH v10 15/16] microcode: disable late loading if CPUs are affected by BDF90

2019-09-12 Thread Chao Gao

It ports the implementation of is_blacklisted() in linux kernel
to Xen.

Late loading may cause system hang if CPUs are affected by BDF90.
Check against BDF90 before performing a late loading.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/microcode.c|  6 ++
 xen/arch/x86/microcode_intel.c  | 23 +++
 xen/include/asm-x86/microcode.h |  1 +
 3 files changed, 30 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 64a4321..dbd2730 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -561,6 +561,12 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 if ( microcode_ops == NULL )
 return -EINVAL;
 
+if ( microcode_ops->is_blacklisted && microcode_ops->is_blacklisted() )
+{
+printk(XENLOG_WARNING "Late ucode loading is disabled!\n");
+return -EPERM;
+}
+
 buffer = xmalloc_bytes(len);
 if ( !buffer )
 return -ENOMEM;
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 19f1ba0..bcef668 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -283,6 +284,27 @@ static enum microcode_match_result compare_patch(
  : OLD_UCODE;
 }
 
+static bool is_blacklisted(void)
+{
+struct cpuinfo_x86 *c = ¤t_cpu_data;
+uint64_t llc_size = c->x86_cache_size * 1024ULL;
+struct cpu_signature *sig = &this_cpu(cpu_sig);
+
+do_div(llc_size, c->x86_max_cores);
+
+/*
+ * Late loading on model 79 with microcode revision less than 0x0b21
+ * and LLC size per core bigger than 2.5MB may result in a system hang.
+ * This behavior is documented in item BDF90, #334165 (Intel Xeon
+ * Processor E7-8800/4800 v4 Product Family).
+ */
+if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
+ llc_size > 2621440 && sig->rev < 0x0b21 )
+return true;
+
+return false;
+}
+
 static int apply_microcode(const struct microcode_patch *patch)
 {
 uint64_t msr_content;
@@ -415,6 +437,7 @@ static const struct microcode_ops microcode_intel_ops = {
 .free_patch   = free_patch,
 .compare_patch= compare_patch,
 .match_cpu= match_cpu,
+.is_blacklisted   = is_blacklisted,
 };
 
 int __init microcode_init_intel(void)
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 7d5a1f8..9ffd9d2 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -30,6 +30,7 @@ struct microcode_ops {
 bool (*match_cpu)(const struct microcode_patch *patch);
 enum microcode_match_result (*compare_patch)(
 const struct microcode_patch *new, const struct microcode_patch *old);
+bool (*is_blacklisted)(void);
 };
 
 struct cpu_signature {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 07/16] microcode/amd: call svm_host_osvw_init() in common code

2019-09-12 Thread Chao Gao

Introduce a vendor hook, .end_update_percpu, for svm_host_osvw_init().
The hook function is called on each cpu after loading an update.
It is a preparation for spliting out apply_microcode() from
cpu_request_microcode().

Note that svm_host_osvm_init() should be called regardless of the
result of loading an update.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - rename end_update to end_update_percpu.
 - use #ifdef rather than #if and frame the implementation with

Changes in v9:
 - call .end_update in early loading path
 - on AMD side, initialize .{start,end}_update only if "CONFIG_HVM"
 is true.
---
 xen/arch/x86/microcode.c| 10 +-
 xen/arch/x86/microcode_amd.c| 25 -
 xen/include/asm-x86/microcode.h |  1 +
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 89a8d2b..5c82a2d 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -276,6 +276,9 @@ static long do_microcode_update(void *_info)
 if ( error )
 info->error = error;
 
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
 info->cpu = cpumask_next(info->cpu, &cpu_online_map);
 if ( info->cpu < nr_cpu_ids )
 return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
@@ -376,7 +379,12 @@ int __init early_microcode_update_cpu(bool start_update)
 if ( rc )
 return rc;
 
-return microcode_update_cpu(data, len);
+rc = microcode_update_cpu(data, len);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return rc;
 }
 else
 return -ENOMEM;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 1d27c71..c96a3b3 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -600,10 +600,6 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 free_patch(mc_amd);
 
   out:
-#if CONFIG_HVM
-svm_host_osvw_init();
-#endif
-
 /*
  * In some cases we may return an error even if processor's microcode has
  * been updated. For example, the first patch in a container file is loaded
@@ -613,29 +609,32 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 return error;
 }
 
+#ifdef CONFIG_HVM
 static int start_update(void)
 {
-#if CONFIG_HVM
 /*
- * We assume here that svm_host_osvw_init() will be called on each cpu 
(from
- * cpu_request_microcode()).
- *
- * Note that if collect_cpu_info() returns an error then
- * cpu_request_microcode() will not invoked thus leaving OSVW bits not
- * updated. Currently though collect_cpu_info() will not fail on processors
- * supporting OSVW so we will not deal with this possibility.
+ * svm_host_osvw_init() will be called on each cpu by calling '.end_update'
+ * in common code.
  */
 svm_host_osvw_reset();
-#endif
 
 return 0;
 }
 
+static void end_update_percpu(void)
+{
+svm_host_osvw_init();
+}
+#endif
+
 static const struct microcode_ops microcode_amd_ops = {
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
+#ifdef CONFIG_HVM
 .start_update = start_update,
+.end_update_percpu= end_update_percpu,
+#endif
 .free_patch   = free_patch,
 .compare_patch= compare_patch,
 .match_cpu= match_cpu,
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index f2a5ea4..b0eee0e 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -24,6 +24,7 @@ struct microcode_ops {
 int (*collect_cpu_info)(struct cpu_signature *csig);
 int (*apply_microcode)(void);
 int (*start_update)(void);
+void (*end_update_percpu)(void);
 void (*free_patch)(void *mc);
 bool (*match_cpu)(const struct microcode_patch *patch);
 enum microcode_match_result (*compare_patch)(
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 13/16] microcode: remove microcode_update_lock

2019-09-12 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index f05db72..856caea 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -232,7 +229,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -247,15 +243,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 4e811b7..19f1ba0 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -288,7 +285,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -303,8 +299,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
@@ -317,7 +312,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 00/16] improve late microcode loading

2019-09-12 Thread Chao Gao

Major changes in version 10:
 - add back the patch to call wbinvd() conditionally
 - add a patch to disable late loading due to BDF90
 - rendezvous CPUs in NMI handler and load ucode. But provide an option
 to disable this behavior.
 - avoid the call of self_nmi() on the control thread because it may
 trigger the unknown_nmi_error() in do_nmi().
 - ensure ->start_update is called during system resuming from
 suspension

Sergey, could you help to test this series on an AMD machine?
Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware. At least, two basic tests are needed:
* do a microcode update after system bootup
* don't bring all pCPUs up at bootup by specifying maxcpus option in xen
  command line and then do a microcode update and online all offlined
  CPUs via 'xen-hptool'.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 12 more details).

This series includes below changes:
 1. Patch 1-11: introduce a global microcode cache and some cleanup
 2. Patch 12: synchronize late microcode loading
 3. Patch 13: support parallel microcodes update on different cores
 4. Patch 14: block #NMI handling during microcode loading
 5. Patch 15: disable late ucode loading due to BDF90
 6. Patch 16: call wbinvd() conditionally

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Previous change log:
Changes in version 9:
 - add Jan's Reviewed-by
 - rendevzous threads in NMI handler to disable NMI. Note that NMI can
 be served as usual on threads that are chosen to initiate ucode loading
 on each core.
 - avoid unnecessary memory allocation or copy when creating a microcode
 patch (patch 12)
 - rework patch 1 to avoid microcode_update_match() being used to
 compare two arbitrary updates.
 - call .end_update in early loading path.

Changes in version 8:
 - block #NMI handling during microcode loading (Patch 16)
 - Don't assume that all CPUs in the system have loaded a same ucode.
 So when parsing a blob, we attempt to save a patch as long as it matches
 with current cpu signature regardless of the revision of the patch.
 And also for loading, we only require the patch to be loaded isn't old
 than the cached one.
 - store an update after the first successful loading on a CPU
 - remove the patch that calls wbinvd() unconditionally before microcode
 loading. It is under internal discussion.
 - divide two big patches into several patches to improve readability.

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.

Chao Gao (16):
  microcode/intel: extend microcode_update_match()
  microcode/amd: distinguish old and mismatched ucode in
microcode_fits()
  microcode: introduce a global cache of ucode patch
  microcode: clean up microcode_resume_cpu
  microcode: remove struct ucode_cpu

[Xen-devel] [PATCH v10 08/16] microcode: pass a patch pointer to apply_microcode()

2019-09-12 Thread Chao Gao

apply_microcode()'s always loading the cached ucode patch forces
a patch to be stored before being loaded. Make apply_microcode()
accept a patch pointer to remove the limitation so that a patch
can be stored after a successful loading.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/microcode.c| 2 +-
 xen/arch/x86/microcode_amd.c| 5 ++---
 xen/arch/x86/microcode_intel.c  | 5 ++---
 xen/include/asm-x86/microcode.h | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 5c82a2d..b44e4d7 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -208,7 +208,7 @@ int microcode_resume_cpu(void)
 
 err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode();
+err = microcode_ops->apply_microcode(microcode_cache);
 spin_unlock(µcode_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index c96a3b3..c6d2ea3 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -253,7 +253,7 @@ static enum microcode_match_result compare_patch(
 return MIS_UCODE;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint32_t rev;
@@ -261,7 +261,6 @@ static int apply_microcode(void)
 unsigned int cpu = smp_processor_id();
 struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 const struct microcode_header_amd *hdr;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !patch )
 return -ENOENT;
@@ -565,7 +564,7 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 
 if ( match_cpu(microcode_get_cache()) )
 {
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 if ( error )
 break;
 }
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 5f1ae2f..b1ec81d 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -323,7 +323,7 @@ static int get_matching_microcode(const void *mc)
 return 1;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint64_t msr_content;
@@ -331,7 +331,6 @@ static int apply_microcode(void)
 unsigned int cpu_num = raw_smp_processor_id();
 struct cpu_signature *sig = &this_cpu(cpu_sig);
 const struct microcode_intel *mc_intel;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !patch )
 return -ENOENT;
@@ -429,7 +428,7 @@ static int cpu_request_microcode(const void *buf, size_t 
size)
 error = offset;
 
 if ( !error && match_cpu(microcode_get_cache()) )
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 
 return error;
 }
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index b0eee0e..02feb09 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -22,7 +22,7 @@ struct microcode_patch {
 struct microcode_ops {
 int (*cpu_request_microcode)(const void *buf, size_t size);
 int (*collect_cpu_info)(struct cpu_signature *csig);
-int (*apply_microcode)(void);
+int (*apply_microcode)(const struct microcode_patch *patch);
 int (*start_update)(void);
 void (*end_update_percpu)(void);
 void (*free_patch)(void *mc);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 09/16] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-12 Thread Chao Gao

During late microcode loading, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). In general, for both
early loading on BSP and late loading, cpu_request_microcode() is
called first to get the matching microcode update contained by
the blob and then apply_microcode() is invoked explicitly on each
cpu in common code.

Given that all CPUs are supposed to have the same signature, parsing
microcode only needs to be done once. So cpu_request_microcode() is
also moved out of microcode_update_cpu().

In some cases (e.g. a broken bios), the system may have multiple
revisions of microcode update. So we would try to load a microcode
update as long as it covers current cpu. And if a cpu loads this patch
successfully, the patch would be stored into the patch cache.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - make microcode_update_cache static
 - raise an error if loading ucode failed with -EIO
 - ensure end_update_percpu() is called following a successful call of
 start_update()

Changes in v9:
 - remove the calling of ->compare_patch in microcode_update_cpu().
 - drop "microcode_" prefix for static function - microcode_parse_blob().
 - rebase and fix conflict

Changes in v8:
 - divide the original patch into three patches to improve readability
 - load an update on each cpu as long as the update covers current cpu
 - store an update after the first successful loading on a CPU
 - Make sure the current CPU (especially pf value) is covered
 by updates.

changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
---
 xen/arch/x86/microcode.c| 182 +++-
 xen/arch/x86/microcode_amd.c|  38 +
 xen/arch/x86/microcode_intel.c  |  66 +++
 xen/include/asm-x86/microcode.h |   5 +-
 4 files changed, 178 insertions(+), 113 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b44e4d7..d4738f6 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,12 +189,19 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return a patch that covers current CPU. If there are multiple patches,
+ * return the one with the highest revision number. Return error If no
+ * patch is found and an error occurs during the parsing process. Otherwise
+ * return NULL.
+ */
+static struct microcode_patch *parse_blob(const char *buf, size_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(&this_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
+
+return NULL;
+}
 
 int microcode_resume_cpu(void)
 {
@@ -220,15 +227,8 @@ void microcode_free_patch(struct microcode_patch 
*microcode_patch)
 xfree(microcode_patch);
 }
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(µcode_mutex));
-
-return microcode_cache;
-}
-
 /* Return true if cache gets updated. Otherwise, return false */
-bool microcode_update_cache(struct microcode_patch *patch)
+static bool microcode_update_cache(struct microcode_patch *patch)
 {
 ASSERT(spin_is_locked(µcode_mutex));
 
@@ -249,49 +249,80 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static int microcode_update_cpu(const void *buf, size_t size)
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(const struct microcode_patch *patch)
 {
-int err;
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
+int err = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
-spin_lock(µcode_mutex);
+if ( unlikely(err) )
+return err;
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(buf, size);
-spin_unlock(µcode_mutex);
+if ( patch )
+err = microcode_ops->apply_microcode(patch);
+else if ( microcode_cache )
+{
+spin_lock(µcode_mutex);
+err = microcode_ops->apply_microcode(microcode_cache);
+if ( err == -EIO )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = NULL;
+}
+spin_unlock(µcode_mutex);
+}
+else
+/* No patch to update */
+err = -ENOENT;
 
 return err;
 }
 
-static long do_microcode_update(v

[Xen-devel] [PATCH v10 01/16] microcode/intel: extend microcode_update_match()

2019-09-12 Thread Chao Gao

to a more generic function. So that it can be used alone to check
an update against the CPU signature and current update revision.

Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header. And
constifying the parameter of microcode_sanity_check() such that it
can be called by microcode_update_match().

Signed-off-by: Chao Gao 
---
Changes in v10:
 - Drop RBs
 - assert that microcode passed to microcode_update_match() would pass
 sanity check. Constify the parameter of microcode_sanity_check()

Changes in v9:
 - microcode_update_match() doesn't accept (sig, pf, rev) any longer.
 Hence, it won't be used to compare two arbitrary updates.
 - rewrite patch description

Changes in v8:
 - make sure enough room for an extended header and signature array

Changes in v6:
 - eliminate unnecessary type casting in microcode_update_match
 - check if a patch has an extend header

Changes in v5:
 - constify the extended_signature
 - use named enum type for the return value of microcode_update_match
---
 xen/arch/x86/microcode_intel.c  | 75 ++---
 xen/include/asm-x86/microcode.h |  6 
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 22fdeca..1a3ffa5 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -134,21 +134,11 @@ static int collect_cpu_info(unsigned int cpu_num, struct 
cpu_signature *csig)
 return 0;
 }
 
-static inline int microcode_update_match(
-unsigned int cpu_num, const struct microcode_header_intel *mc_header,
-int sig, int pf)
+static int microcode_sanity_check(const void *mc)
 {
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
-
-return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
-(mc_header->rev > uci->cpu_sig.rev));
-}
-
-static int microcode_sanity_check(void *mc)
-{
-struct microcode_header_intel *mc_header = mc;
-struct extended_sigtable *ext_header = NULL;
-struct extended_signature *ext_sig;
+const struct microcode_header_intel *mc_header = mc;
+const struct extended_sigtable *ext_header = NULL;
+const struct extended_signature *ext_sig;
 unsigned long total_size, data_size, ext_table_size;
 unsigned int ext_sigcount = 0, i;
 uint32_t sum, orig_sum;
@@ -234,6 +224,42 @@ static int microcode_sanity_check(void *mc)
 return 0;
 }
 
+/* Check an update against the CPU signature and current update revision */
+static enum microcode_match_result microcode_update_match(
+const struct microcode_header_intel *mc_header, unsigned int cpu)
+{
+const struct extended_sigtable *ext_header;
+const struct extended_signature *ext_sig;
+unsigned int i;
+struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+unsigned int sig = uci->cpu_sig.sig;
+unsigned int pf = uci->cpu_sig.pf;
+unsigned int rev = uci->cpu_sig.rev;
+unsigned long data_size = get_datasize(mc_header);
+const void *end = (const void *)mc_header + get_totalsize(mc_header);
+
+ASSERT(!microcode_sanity_check(mc_header));
+if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+ext_header = (const void *)(mc_header + 1) + data_size;
+ext_sig = (const void *)(ext_header + 1);
+
+/*
+ * Make sure there is enough space to hold an extended header and enough
+ * array elements.
+ */
+if ( (end < (const void *)ext_sig) ||
+ (end < (const void *)(ext_sig + ext_header->count)) )
+return MIS_UCODE;
+
+for ( i = 0; i < ext_header->count; i++ )
+if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 /*
  * return 0 - no update found
  * return 1 - found update
@@ -243,31 +269,12 @@ static int get_matching_microcode(const void *mc, 
unsigned int cpu)
 {
 struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
 const struct microcode_header_intel *mc_header = mc;
-const struct extended_sigtable *ext_header;
 unsigned long total_size = get_totalsize(mc_header);
-int ext_sigcount, i;
-struct extended_signature *ext_sig;
 void *new_mc;
 
-if ( microcode_update_match(cpu, mc_header,
-mc_header->sig, mc_header->pf) )
-goto find;
-
-if ( total_size <= (get_datasize(mc_header) + MC_HEADER_SIZE) )
+if ( microcode_update_match(mc, cpu) != NEW_UCODE )
 return 0;
 
-ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-ext_sigcount = ext_header->count;
-ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-for ( i = 0; i < ext_sigcount; i++ )
-{
-if (

[Xen-devel] [PATCH v10 16/16] microcode/intel: writeback and invalidate cache conditionally

2019-09-12 Thread Chao Gao

It is needed to mitigate some issues on this specific Broadwell CPU.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/microcode_intel.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index bcef668..4e5e7f9 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -305,6 +305,31 @@ static bool is_blacklisted(void)
 return false;
 }
 
+static void microcode_quirk(void)
+{
+struct cpuinfo_x86 *c;
+uint64_t llc_size;
+
+/*
+ * Don't refer to current_cpu_data, which isn't fully initialized
+ * before this stage.
+ */
+if ( system_state < SYS_STATE_smp_boot )
+return;
+
+c = ¤t_cpu_data;
+llc_size = c->x86_cache_size * 1024ULL;
+do_div(llc_size, c->x86_max_cores);
+
+/*
+ * To mitigate some issues on this specific Broadwell CPU, writeback and
+ * invalidate cache regardless of ucode revision.
+ */
+if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
+ llc_size > 2621440 )
+wbinvd();
+}
+
 static int apply_microcode(const struct microcode_patch *patch)
 {
 uint64_t msr_content;
@@ -323,6 +348,8 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 BUG_ON(local_irq_is_enabled());
 
+microcode_quirk();
+
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
 wrmsrl(MSR_IA32_UCODE_REV, 0x0ULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 11/16] microcode: reduce memory allocation and copy when creating a patch

2019-09-12 Thread Chao Gao

To create a microcode patch from a vendor-specific update,
allocate_microcode_patch() copied everything from the update.
It is not efficient. Essentially, we just need to go through
ucodes in the blob, find the one with the newest revision and
install it into the microcode_patch. In the process, buffers
like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
side) can be reused. microcode_patch now is allocated after
it is sure that there is a matching ucode.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - avoid unnecessary type casting
   * introduce compare_header on AMD side
   * specify the type of the first parameter of get_next_ucode_from_buffer()
 on Intel side

Changes in v9:
 - new
---
 xen/arch/x86/microcode_amd.c   | 112 +
 xen/arch/x86/microcode_intel.c |  67 +---
 2 files changed, 69 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 1d1bea4..f05db72 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -194,36 +194,6 @@ static bool match_cpu(const struct microcode_patch *patch)
 return patch && (microcode_fits(patch->mc_amd) == NEW_UCODE);
 }
 
-static struct microcode_patch *alloc_microcode_patch(
-const struct microcode_amd *mc_amd)
-{
-struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
-struct microcode_amd *cache = xmalloc(struct microcode_amd);
-void *mpb = xmalloc_bytes(mc_amd->mpb_size);
-struct equiv_cpu_entry *equiv_cpu_table =
-xmalloc_bytes(mc_amd->equiv_cpu_table_size);
-
-if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
-{
-xfree(microcode_patch);
-xfree(cache);
-xfree(mpb);
-xfree(equiv_cpu_table);
-return ERR_PTR(-ENOMEM);
-}
-
-memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
-cache->mpb = mpb;
-cache->mpb_size = mc_amd->mpb_size;
-memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
-   mc_amd->equiv_cpu_table_size);
-cache->equiv_cpu_table = equiv_cpu_table;
-cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
-microcode_patch->mc_amd = cache;
-
-return microcode_patch;
-}
-
 static void free_patch(void *mc)
 {
 struct microcode_amd *mc_amd = mc;
@@ -236,6 +206,17 @@ static void free_patch(void *mc)
 }
 }
 
+static enum microcode_match_result compare_header(
+const struct microcode_header_amd *new_header,
+const struct microcode_header_amd *old_header)
+{
+if ( new_header->processor_rev_id == old_header->processor_rev_id )
+return (new_header->patch_id > old_header->patch_id) ? NEW_UCODE
+ : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 static enum microcode_match_result compare_patch(
 const struct microcode_patch *new, const struct microcode_patch *old)
 {
@@ -246,11 +227,7 @@ static enum microcode_match_result compare_patch(
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 
-if ( new_header->processor_rev_id == old_header->processor_rev_id )
-return (new_header->patch_id > old_header->patch_id) ?
-NEW_UCODE : OLD_UCODE;
-
-return MIS_UCODE;
+return compare_header(new_header, old_header);
 }
 
 static int apply_microcode(const struct microcode_patch *patch)
@@ -328,18 +305,10 @@ static int get_ucode_from_buffer_amd(
 return -EINVAL;
 }
 
-if ( mc_amd->mpb_size < mpbuf->len )
-{
-if ( mc_amd->mpb )
-{
-xfree(mc_amd->mpb);
-mc_amd->mpb_size = 0;
-}
-mc_amd->mpb = xmalloc_bytes(mpbuf->len);
-if ( mc_amd->mpb == NULL )
-return -ENOMEM;
-mc_amd->mpb_size = mpbuf->len;
-}
+mc_amd->mpb = xmalloc_bytes(mpbuf->len);
+if ( !mc_amd->mpb )
+return -ENOMEM;
+mc_amd->mpb_size = mpbuf->len;
 memcpy(mc_amd->mpb, mpbuf->data, mpbuf->len);
 
 pr_debug("microcode: CPU%d size %zu, block size %u offset %zu equivID %#x 
rev %#x\n",
@@ -459,8 +428,9 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
  size_t bufsize)
 {
 struct microcode_amd *mc_amd;
+struct microcode_header_amd *saved = NULL;
 struct microcode_patch *patch = NULL;
-size_t offset = 0;
+size_t offset = 0, saved_size = 0;
 int error = 0;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
@@ -550,29 +520,22 @@ static struct microcode_patch 
*cpu_request_microcode(const void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,

[Xen-devel] [PATCH v10 10/16] microcode: unify ucode loading during system bootup and resuming

2019-09-12 Thread Chao Gao

During system bootup and resuming, CPUs just load the cached ucode.
So one unified function microcode_update_one() is introduced. It
takes a boolean to indicate whether ->start_update should be called.
Since early_microcode_update_cpu() is only called on BSP (APs call
the unified function), start_update is always true and so remove
this parameter.

There is a functional change: ->start_update is called on BSP and
->end_update_percpu is called during system resuming. They are not
invoked by previous microcode_resume_cpu().

Signed-off-by: Chao Gao 
---
Changes in v10:
 - call ->start_update for system resume from suspension

Changes in v9:
 - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
   microcode_update_one()
 - rebase and fix conflicts.

Changes in v8:
 - split out from the previous patch
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 91 +++--
 xen/arch/x86/smpboot.c  |  5 +--
 xen/include/asm-x86/processor.h |  4 +-
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 269b140..01e6aec 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+microcode_update_one(true);
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index d4738f6..c2ea20f 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char *buf, 
size_t len)
 return NULL;
 }
 
-int microcode_resume_cpu(void)
-{
-int err;
-struct cpu_signature *sig = &this_cpu(cpu_sig);
-
-if ( !microcode_ops )
-return 0;
-
-spin_lock(µcode_mutex);
-
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode(microcode_cache);
-spin_unlock(µcode_mutex);
-
-return err;
-}
-
 void microcode_free_patch(struct microcode_patch *microcode_patch)
 {
 microcode_ops->free_patch(microcode_patch->mc);
@@ -394,11 +376,38 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-int __init early_microcode_update_cpu(bool start_update)
+/* Load a cached update to current cpu */
+int microcode_update_one(bool start_update)
+{
+int err;
+
+if ( !microcode_ops )
+return -EOPNOTSUPP;
+
+microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+
+if ( start_update && microcode_ops->start_update )
+{
+err = microcode_ops->start_update();
+if ( err )
+return err;
+}
+
+err = microcode_update_cpu(NULL);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return err;
+}
+
+/* BSP calls this function to parse ucode blob and then apply an update. */
+int __init early_microcode_update_cpu(void)
 {
 int rc = 0;
 void *data = NULL;
 size_t len;
+struct microcode_patch *patch;
 
 if ( !microcode_ops )
 return -ENOSYS;
@@ -414,44 +423,26 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(&ucode_mod);
 }
 
-microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
-
 if ( !data )
 return -ENOMEM;
 
-if ( start_update )
+patch = parse_blob(data, len);
+if ( IS_ERR(patch) )
 {
-struct microcode_patch *patch;
-
-patch = parse_blob(data, len);
-if ( IS_ERR(patch) )
-{
-printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
-   PTR_ERR(patch));
-return PTR_ERR(patch);
-}
-
-if ( !patch )
-return -ENOENT;
-
-spin_lock(µcode_mutex);
-rc = microcode_update_cache(patch);
-spin_unlock(µcode_mutex);
-ASSERT(rc);
-
-if ( microcode_ops->start_update )
-rc = microcode_ops->start_update();
-
-if ( rc )
-return rc;
+printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
+   PTR_ERR(patch));
+return PTR_ERR(patch);
 }
 
-rc = microcode_update_cpu(NULL);
+if ( !patch )
+return -ENOENT;
 
-if ( microcode_ops->end_update_percpu )
-microcode_ops->end_update_percpu();
+spin_lock(µcode_mutex);
+rc = microcode_update_cache(patch);
+spin_unlock(µcode_mutex);
+ASSERT(rc);
 
-return rc;
+return microcode_update_one(true);
 }
 
 int __init early_microcode_init(void)
@@ -471,7 +462,7 @@ int __init early_microcode_init(void)
 microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end ||

[Xen-devel] [PATCH v10 04/16] microcode: clean up microcode_resume_cpu

2019-09-12 Thread Chao Gao

Previously, a per-cpu ucode cache is maintained. Then each CPU had one
per-cpu update cache and there might be multiple versions of microcode.
Thus microcode_resume_cpu tried best to update microcode by loading
every update cache until a successful load.

But now the cache struct is simplified a lot and only a single ucode is
cached. a single invocation of ->apply_microcode() would load the cache
and make microcode updated.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
changes in v8:
 - new
 - separated from the following patch
---
 xen/arch/x86/microcode.c| 40 ++-
 xen/arch/x86/microcode_amd.c| 47 -
 xen/arch/x86/microcode_intel.c  |  6 --
 xen/include/asm-x86/microcode.h |  1 -
 4 files changed, 2 insertions(+), 92 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index e218a9d..922b94f 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -215,8 +215,6 @@ int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
 struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
-struct cpu_signature nsig;
-unsigned int cpu2;
 
 if ( !microcode_ops )
 return 0;
@@ -224,42 +222,8 @@ int microcode_resume_cpu(unsigned int cpu)
 spin_lock(µcode_mutex);
 
 err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
-if ( err )
-{
-__microcode_fini_cpu(cpu);
-spin_unlock(µcode_mutex);
-return err;
-}
-
-if ( uci->mc.mc_valid )
-{
-err = microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid);
-if ( err >= 0 )
-{
-if ( err )
-err = microcode_ops->apply_microcode(cpu);
-spin_unlock(µcode_mutex);
-return err;
-}
-}
-
-nsig = uci->cpu_sig;
-__microcode_fini_cpu(cpu);
-uci->cpu_sig = nsig;
-
-err = -EIO;
-for_each_online_cpu ( cpu2 )
-{
-uci = &per_cpu(ucode_cpu_info, cpu2);
-if ( uci->mc.mc_valid &&
- microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid) > 0 )
-{
-err = microcode_ops->apply_microcode(cpu);
-break;
-}
-}
-
-__microcode_fini_cpu(cpu);
+if ( likely(!err) )
+err = microcode_ops->apply_microcode(cpu);
 spin_unlock(µcode_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 2dca1df..04b00aa 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -654,52 +654,6 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
-struct microcode_amd *mc_amd = uci->mc.mc_amd;
-const struct microcode_amd *src = mc;
-
-if ( microcode_fits(src, cpu) != NEW_UCODE )
-return 0;
-
-if ( src != mc_amd )
-{
-if ( mc_amd )
-{
-xfree(mc_amd->equiv_cpu_table);
-xfree(mc_amd->mpb);
-xfree(mc_amd);
-}
-
-mc_amd = xmalloc(struct microcode_amd);
-uci->mc.mc_amd = mc_amd;
-if ( !mc_amd )
-return -ENOMEM;
-mc_amd->equiv_cpu_table = xmalloc_bytes(src->equiv_cpu_table_size);
-if ( !mc_amd->equiv_cpu_table )
-goto err1;
-mc_amd->mpb = xmalloc_bytes(src->mpb_size);
-if ( !mc_amd->mpb )
-goto err2;
-
-mc_amd->equiv_cpu_table_size = src->equiv_cpu_table_size;
-mc_amd->mpb_size = src->mpb_size;
-memcpy(mc_amd->mpb, src->mpb, src->mpb_size);
-memcpy(mc_amd->equiv_cpu_table, src->equiv_cpu_table,
-   src->equiv_cpu_table_size);
-}
-
-return 1;
-
-err2:
-xfree(mc_amd->equiv_cpu_table);
-err1:
-xfree(mc_amd);
-uci->mc.mc_amd = NULL;
-return -ENOMEM;
-}
-
 static int start_update(void)
 {
 #if CONFIG_HVM
@@ -719,7 +673,6 @@ static int start_update(void)
 }
 
 static const struct microcode_ops microcode_amd_ops = {
-.microcode_resume_match   = microcode_resume_match,
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index eefc2d2..97f759e 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -455,13 +455,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-return get_matching_microcode(mc, cpu);
-}
-
 static const

[Xen-devel] [PATCH v10 05/16] microcode: remove struct ucode_cpu_info

2019-09-12 Thread Chao Gao

Remove the per-cpu cache field in struct ucode_cpu_info since it has
been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.

The cpu status notifier is also removed. It was used to free the "mc"
field to avoid memory leak.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - rebase and fix conflict

Changes in v8:
 - split microcode_resume_cpu() cleanup to a separate patch.

Changes in v6:
 - remove the whole struct ucode_cpu_info instead of the per-cpu cache
  in it.
---
 xen/arch/x86/apic.c |  2 +-
 xen/arch/x86/microcode.c| 57 +++
 xen/arch/x86/microcode_amd.c| 59 +
 xen/arch/x86/microcode_intel.c  | 28 +++
 xen/arch/x86/spec_ctrl.c|  2 +-
 xen/include/asm-x86/microcode.h | 12 +
 6 files changed, 34 insertions(+), 126 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index ea0d561..6cdb50c 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1190,7 +1190,7 @@ static void __init check_deadline_errata(void)
 else
 rev = (unsigned long)m->driver_data;
 
-if ( this_cpu(ucode_cpu_info).cpu_sig.rev >= rev )
+if ( this_cpu(cpu_sig).rev >= rev )
 return;
 
 setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE);
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 922b94f..d17dbec 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -187,7 +187,7 @@ const struct microcode_ops *microcode_ops;
 
 static DEFINE_SPINLOCK(microcode_mutex);
 
-DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
+DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 struct microcode_info {
 unsigned int cpu;
@@ -196,32 +196,17 @@ struct microcode_info {
 char buffer[1];
 };
 
-static void __microcode_fini_cpu(unsigned int cpu)
-{
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
-
-xfree(uci->mc.mc_valid);
-memset(uci, 0, sizeof(*uci));
-}
-
-static void microcode_fini_cpu(unsigned int cpu)
-{
-spin_lock(µcode_mutex);
-__microcode_fini_cpu(cpu);
-spin_unlock(µcode_mutex);
-}
-
 int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->apply_microcode(cpu);
 spin_unlock(µcode_mutex);
@@ -268,16 +253,13 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 {
 int err;
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->cpu_request_microcode(cpu, buf, size);
-else
-__microcode_fini_cpu(cpu);
-
 spin_unlock(µcode_mutex);
 
 return err;
@@ -364,29 +346,10 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-static int microcode_percpu_callback(
-struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-unsigned int cpu = (unsigned long)hcpu;
-
-switch ( action )
-{
-case CPU_DEAD:
-microcode_fini_cpu(cpu);
-break;
-}
-
-return NOTIFY_DONE;
-}
-
-static struct notifier_block microcode_percpu_nfb = {
-.notifier_call = microcode_percpu_callback,
-};
-
 int __init early_microcode_update_cpu(bool start_update)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -405,7 +368,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(&ucode_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
+microcode_ops->collect_cpu_info(cpu, sig);
 
 if ( data )
 {
@@ -424,7 +387,7 @@ int __init early_microcode_update_cpu(bool start_update)
 int __init early_microcode_init(void)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -437,12 +400,10 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-

[Xen-devel] [PATCH v10 14/16] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-12 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Signed-off-by: Chao Gao 
Signed-off-by: Sergey Dyasli 
---
Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc | 10 +
 xen/arch/x86/microcode.c  | 95 ---
 xen/arch/x86/traps.c  |  6 ++-
 xen/include/asm-x86/nmi.h |  3 ++
 4 files changed, 96 insertions(+), 18 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 7c72e31..3017073 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2056,6 +2056,16 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+### ucode_loading_in_nmi (x86)
+> `= `
+
+> Default: `true`
+
+When one CPU is loading ucode, handling NMIs on sibling threads or threads on
+other cores might cause problems. By default, all CPUs rendezvous in NMI 
handler
+and load ucode. This option provides a way to disable it in case of some CPUs
+don't allow ucode loading in NMI handler.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 049eda6..64a4321 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,6 +127,9 @@ static int __init parse_ucode(const char *s)
 }
 custom_param("ucode", parse_ucode);
 
+static bool __read_mostly opt_ucode_loading_in_nmi = true;
+boolean_runtime_param("ucode_loading_in_nmi", opt_ucode_loading_in_nmi);
+
 /*
  * 8MB ought to be enough.
  */
@@ -232,6 +237,7 @@ DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
  */
 static cpumask_t cpu_callin_map;
 static atomic_t cpu_out, cpu_updated;
+const struct microcode_patch *nmi_patch;
 
 /*
  * Return a patch that covers current CPU. If there are multiple patches,
@@ -354,6 +360,50 @@ static void set_state(unsigned int state)
 smp_wmb();
 }
 
+static int secondary_thread_work(void)
+{
+cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
+
+return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
+}
+
+static int primary_thread_work(const struct microcode_patch *patch)
+{
+int ret;
+
+cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
+
+if ( !wait_for_state(LOADING_ENTER) )
+return -EBUSY;
+
+ret = microcode_ops->apply_microcode(patch);
+if ( !ret )
+atomic_inc(&cpu_updated);
+atomic_inc(&cpu_out);
+
+return ret;
+}
+
+static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
+unsigned int controller = cpumask_first(&cpu_online_map);
+
+/* System-generated NMI, will be ignored */
+if ( loading_state != LOADING_CALLIN )
+return 0;
+
+if ( cpu == controller || (!opt_ucode_loading_in_nmi && cpu == primary) )
+return 0;
+
+if ( cpu == primary )
+primary_thread_work(nmi_patch);
+else
+secondary_thread_work();
+
+return 0;
+}
+
 static int secondary_thread_fn(void)
 {
 unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
@@ -361,10 +411,7 @@ static int secondary_thread_fn(void)
 if ( !wait_for_state(LOADING_CALLIN) )
 return -EBUSY;
 
-cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
-
-if ( !wait_for_state(LOADING_EXIT) )
-return -EBUSY;
+self_nmi();
 
 /* Copy update revision from the primary thread. */
 this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
@@ -379

Re: [Xen-devel] [PATCH v10 09/16] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-12 Thread Chao Gao

On Thu, Sep 12, 2019 at 04:07:16PM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -249,49 +249,80 @@ bool microcode_update_cache(struct microcode_patch 
>> *patch)
>>  return true;
>>  }
>>  
>> -static int microcode_update_cpu(const void *buf, size_t size)
>> +/*
>> + * Load a microcode update to current CPU.
>> + *
>> + * If no patch is provided, the cached patch will be loaded. Microcode 
>> update
>> + * during APs bringup and CPU resuming falls into this case.
>> + */
>> +static int microcode_update_cpu(const struct microcode_patch *patch)
>>  {
>> -int err;
>> -unsigned int cpu = smp_processor_id();
>> -struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
>> +int err = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>>  
>> -spin_lock(µcode_mutex);
>> +if ( unlikely(err) )
>> +return err;
>>  
>> -err = microcode_ops->collect_cpu_info(sig);
>> -if ( likely(!err) )
>> -err = microcode_ops->cpu_request_microcode(buf, size);
>> -spin_unlock(µcode_mutex);
>> +if ( patch )
>> +err = microcode_ops->apply_microcode(patch);
>> +else if ( microcode_cache )
>> +{
>> +spin_lock(µcode_mutex);
>> +err = microcode_ops->apply_microcode(microcode_cache);
>> +if ( err == -EIO )
>> +{
>> +microcode_free_patch(microcode_cache);
>> +microcode_cache = NULL;
>> +}
>> +spin_unlock(µcode_mutex);
>> +}
>
>I'm having trouble understanding the locking discipline here: Why
>do you call ->apply_microcode() once with the lock held and once
>without? If this is to guard against microcode_cache changing,

Yes. microcode_cache is protected by microcode_mutex;

>then (a) the check of it being non-NULL would need to be done with
>the lock held as well and

Will do.

>(b) you'd need to explain why the non-
>locked call to ->apply_microcode() is okay.

->apply_microcode() was always called with this lock held was because
it always read the old per-cpu cache which was protected by the lock.
It gave us an impression that ->apply_microcode() was protected by the
lock.

The patch before this one makes ->apply_microcode() accept a patch
pointer. With this change, if the patch being passed should be accessed
with some lock held (like the secondary call site above), we acquire
the lock. Otherwise, no lock is taken and the caller of
microcode_update_cpu() is supposed to guarantee the patch won't be
changed by others.

>
>It certainly wasn't this way in v8, yet the v9 revision log also
>doesn't mention such a (not insignificant) change (which is part
>of the reason why I didn't spot it in v9).

It is my bad.

>
>> +else
>> +/* No patch to update */
>> +err = -ENOENT;
>>  
>>  return err;
>>  }
>>  
>> -static long do_microcode_update(void *_info)
>> +static long do_microcode_update(void *patch)
>>  {
>> -struct microcode_info *info = _info;
>> -int error;
>> -
>> -BUG_ON(info->cpu != smp_processor_id());
>> +unsigned int cpu;
>> +int ret = microcode_update_cpu(patch);
>>  
>> -error = microcode_update_cpu(info->buffer, info->buffer_size);
>> -if ( error )
>> -info->error = error;
>> +/* Store the patch after a successful loading */
>> +if ( !ret && patch )
>> +{
>> +spin_lock(µcode_mutex);
>> +microcode_update_cache(patch);
>> +spin_unlock(µcode_mutex);
>> +patch = NULL;
>> +}
>>  
>>  if ( microcode_ops->end_update_percpu )
>>  microcode_ops->end_update_percpu();
>>  
>> -info->cpu = cpumask_next(info->cpu, &cpu_online_map);
>> -if ( info->cpu < nr_cpu_ids )
>> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
>> info);
>> +/*
>> + * Each thread tries to load ucode and only the first thread of a core
>> + * would succeed. Ignore error other than -EIO.
>> + */
>> +if ( ret != -EIO )
>> +ret = 0;
>
>I don't think this is a good idea. Ignoring a _specific_ error
>code (e.g. indicating "already loaded" or "newer patch already
>loaded") is fine, but here you also ignore things like -ENOMEM
>or -EINVAL.

will do.

>
>> +cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
>> +if ( cpu

Re: [Xen-devel] [PATCH v10 12/16] x86/microcode: Synchronize late microcode loading

2019-09-12 Thread Chao Gao

On Thu, Sep 12, 2019 at 05:32:22PM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -264,38 +336,158 @@ static int microcode_update_cpu(const struct 
>> microcode_patch *patch)
>>  return err;
>>  }
>>  
>> -static long do_microcode_update(void *patch)
>> +static bool wait_for_state(unsigned int state)
>> +{
>> +while ( loading_state != state )
>> +{
>> +if ( state != LOADING_EXIT && loading_state == LOADING_EXIT )
>> +return false;
>
>This is at least somewhat confusing: There's no indication here
>that "loading_state" may change behind the function's back. So
>in general one could be (and I initially was) tempted to suggest
>dropping the apparently redundant left side of the &&. But that
>would end up wrong if the compiler translates the above to two
>separate reads of "loading_state". Therefore I'd like to suggest
>
>static bool wait_for_state(typeof(loading_state) state)
>{
>typeof(loading_state) cur_state;
>
>while ( (cur_state = ACCESS_ONCE(loading_state)) != state )
>{
>if ( cur_state == LOADING_EXIT )
>return false;
>cpu_relax();
>}
>
>return true;
>}
>
>or something substantially similar (if, e.g., you dislike the
>use of typeof() here).

The code snippet above is terrific. Will take it.

>
>> +static int secondary_thread_fn(void)
>> +{
>> +unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +
>> +if ( !wait_for_state(LOADING_CALLIN) )
>> +return -EBUSY;
>> +
>> +cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>> +
>> +if ( !wait_for_state(LOADING_EXIT) )
>> +return -EBUSY;
>
>This return looks to be unreachable, doesn't it?

Yes. I will use a variable to hold its return value and assert the
return value is always true.

Other comments are reasonable and I will follow your suggestion.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 01/16] microcode/intel: extend microcode_update_match()

2019-09-12 Thread Chao Gao

On Fri, Sep 13, 2019 at 08:50:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 12:24, Jan Beulich wrote:
>> On 12.09.2019 09:22, Chao Gao wrote:
>>> --- a/xen/arch/x86/microcode_intel.c
>>> +++ b/xen/arch/x86/microcode_intel.c
>>> @@ -134,21 +134,11 @@ static int collect_cpu_info(unsigned int cpu_num, 
>>> struct cpu_signature *csig)
>>>  return 0;
>>>  }
>>>  
>>> -static inline int microcode_update_match(
>>> -unsigned int cpu_num, const struct microcode_header_intel *mc_header,
>>> -int sig, int pf)
>>> +static int microcode_sanity_check(const void *mc)
>>>  {
>>> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
>>> -
>>> -return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
>>> -(mc_header->rev > uci->cpu_sig.rev));
>>> -}
>>> -
>>> -static int microcode_sanity_check(void *mc)
>>> -{
>>> -struct microcode_header_intel *mc_header = mc;
>>> -struct extended_sigtable *ext_header = NULL;
>>> -struct extended_signature *ext_sig;
>>> +const struct microcode_header_intel *mc_header = mc;
>>> +const struct extended_sigtable *ext_header = NULL;
>>> +const struct extended_signature *ext_sig;
>>>  unsigned long total_size, data_size, ext_table_size;
>>>  unsigned int ext_sigcount = 0, i;
>>>  uint32_t sum, orig_sum;
>>> @@ -234,6 +224,42 @@ static int microcode_sanity_check(void *mc)
>>>  return 0;
>>>  }
>>>  
>>> +/* Check an update against the CPU signature and current update revision */
>>> +static enum microcode_match_result microcode_update_match(
>>> +const struct microcode_header_intel *mc_header, unsigned int cpu)
>>> +{
>>> +const struct extended_sigtable *ext_header;
>>> +const struct extended_signature *ext_sig;
>>> +unsigned int i;
>>> +struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>>> +unsigned int sig = uci->cpu_sig.sig;
>>> +unsigned int pf = uci->cpu_sig.pf;
>>> +unsigned int rev = uci->cpu_sig.rev;
>>> +unsigned long data_size = get_datasize(mc_header);
>>> +const void *end = (const void *)mc_header + get_totalsize(mc_header);
>>> +
>>> +ASSERT(!microcode_sanity_check(mc_header));
>>> +if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
>>> +return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
>>> +
>>> +ext_header = (const void *)(mc_header + 1) + data_size;
>>> +ext_sig = (const void *)(ext_header + 1);
>>> +
>>> +/*
>>> + * Make sure there is enough space to hold an extended header and 
>>> enough
>>> + * array elements.
>>> + */
>>> +if ( (end < (const void *)ext_sig) ||
>>> + (end < (const void *)(ext_sig + ext_header->count)) )
>>> +return MIS_UCODE;
>> 
>> With you now assuming that the blob has previously passed
>> microcode_sanity_check(), this only needs to be
>> 
>> if ( (end <= (const void *)ext_sig) )
>> return MIS_UCODE;
>> 
>> now afaict.
>> 
>> Reviewed-by: Jan Beulich 
>> preferably with this adjustment (assuming you agree).
>
>FAOD: I'd be happy to make the adjustment while committing, but
>I'd like to have your consent (or you proving me wrong). This
>would, as it looks, allow everything up to patch 8 to go in.

Please go ahead. Thanks

Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2019-09-13 Thread Chao Gao

On Fri, Sep 13, 2019 at 10:02:24AM +, Spassov, Stanislav wrote:
>On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote:
>>On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
>>>>>> On 13.12.18 at 04:46,  wrote:
>>>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>>>>> On 12.12.18 at 16:18,  wrote:
>>>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>>>>> I find some pass-thru devices don't work any more across guest 
>>>>>>>>>>> reboot.
>>>>>>>>>>> Assigning it to another guest also meets the same issue. And the 
>>>>>>>>>>> only
>>>>>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>>>>>> Someone reported this issue one year ago [1]. More detail also can 
>>>>>>>>>>> be
>>>>>>>>>>> found in [2].
>>>>>>>>>>>
>>>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall 
>>>>>>>>>>> bit
>>>>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal 
>>>>>>>>>>> maskall
>>>>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>>>>> MSI-X message control register.
>>>>>>>>>>>
>>>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting 
>>>>>>>>>>> MSI-X
>>>>>>>>>>> internal state of a device, we employ it to fix this issue rather 
>>>>>>>>>>> than
>>>>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>>>>
>>>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>>>>>> the device's msix and pirq has been created. This limitation 
>>>>>>>>>>> prevents
>>>>>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>>>>
>>>>>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>>>>>> me worry about the raciness of the attach/detach procedure: if 
>>>>>>>>>> there's
>>>>>>>>>> a state where pciback assumes the device has been detached from the
>>>>>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>>>>>> another guest in such state will fail.
>>>>>>>>>
>>>>>>>>>I wonder whether this additional reset functionality could be done out
>>>>>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>>>>>and then do the extra things that are not properly done there.
>>>>>>>> 
>>>>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>>>

Re: [Xen-devel] [PATCH v10 14/16] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-15 Thread Chao Gao

On Fri, Sep 13, 2019 at 11:14:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> When one core is loading ucode, handling NMI on sibling threads or
>> on other cores in the system might be problematic. By rendezvousing
>> all CPUs in NMI handler, it prevents NMI acceptance during ucode
>> loading.
>> 
>> Basically, some work previously done in stop_machine context is
>> moved to NMI handler. Primary threads call in and load ucode in
>> NMI handler. Secondary threads wait for the completion of ucode
>> loading on all CPU cores. An option is introduced to disable this
>> behavior.
>> 
>> Signed-off-by: Chao Gao 
>> Signed-off-by: Sergey Dyasli 
>
>
>
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2056,6 +2056,16 @@ microcode in the cpio name space must be:
>>- on Intel: kernel/x86/microcode/GenuineIntel.bin
>>- on AMD  : kernel/x86/microcode/AuthenticAMD.bin
>>  
>> +### ucode_loading_in_nmi (x86)
>> +> `= `
>> +
>> +> Default: `true`
>> +
>> +When one CPU is loading ucode, handling NMIs on sibling threads or threads 
>> on
>> +other cores might cause problems. By default, all CPUs rendezvous in NMI 
>> handler
>> +and load ucode. This option provides a way to disable it in case of some 
>> CPUs
>> +don't allow ucode loading in NMI handler.
>
>We already have "ucode=", why don't you extend it to allow "ucode=nmi"
>and "ucode=no-nmi"? (In any event, please no underscores in new
>command line options - use hyphens if necessary.)

Ok. Will extend the "ucode" parameter.

>
>> @@ -232,6 +237,7 @@ DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>>   */
>>  static cpumask_t cpu_callin_map;
>>  static atomic_t cpu_out, cpu_updated;
>> +const struct microcode_patch *nmi_patch;
>
>static
>
>> @@ -354,6 +360,50 @@ static void set_state(unsigned int state)
>>  smp_wmb();
>>  }
>>  
>> +static int secondary_thread_work(void)
>> +{
>> +cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>> +
>> +return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
>> +}
>> +
>> +static int primary_thread_work(const struct microcode_patch *patch)
>
>I think it would be nice if both functions carried "nmi" in their
>names - how about {primary,secondary}_nmi_work()? Or wait - the
>primary one gets used outside of NMI as well, so I'm fine with its
>name.
>The secondary one, otoh, is NMI-specific and also its only
>caller doesn't care about the return value, so I'd suggest making
>it return void alongside adding some form of "nmi" to its name. Or,

Will do.

>perhaps even better, have secondary_thread_fn() call it, moving the
>cpu_sig update here (and of course then there shouldn't be any
>"nmi" added to its name).

Even with "ucode=no-nmi", secondary threads have to do busy-loop in
NMI handling util primary threads completing the update. Otherwise,
it may access MSRs (like SPEC_CTRL) which is considered unsafe.

>
>> +static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>> +{
>> +unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +unsigned int controller = cpumask_first(&cpu_online_map);
>> +
>> +/* System-generated NMI, will be ignored */
>> +if ( loading_state != LOADING_CALLIN )
>> +return 0;
>
>I'm not happy at all to see NMIs being ignored. But by returning
>zero, you do _not_ ignore it. Did you perhaps mean "will be ignored
>here", in which case perhaps better "leave to main handler"? And
>for the comment to extend to the other two conditions right below,
>I think it would be better to combine them all into a single if().
>
>Also, throughout the series, I think you want to consistently use
>ACCESS_ONCE() for reads/writes from/to loading_state.
>
>> +if ( cpu == controller || (!opt_ucode_loading_in_nmi && cpu == primary) 
>> )
>> +return 0;
>
>Why not

As I said above, secondary threads are expected to stay in NMI handler
regardless the setting of opt_ucode_loading_in_nmi.

>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -126,6 +126,8 @@ boolean_param("ler", opt_ler);
>>  /* LastExceptionFromIP on this hardware.  Zero if LER is not in use. */
>>  unsigned int __read_mostly ler_msr;
>>  
>> +unsigned int __read_mostly nmi_cpu;
>
>Since this variable (for now) is never written to it should gain a
>comment saying why this is, and perhaps it would then also better be
>const rather than __read_mostly.

How about use the macro below:
#define NMI_CPU 0

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 00/16] improve late microcode loading

2019-09-17 Thread Chao Gao

On Fri, Sep 13, 2019 at 10:47:36AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> This series includes below changes:
>>  1. Patch 1-11: introduce a global microcode cache and some cleanup
>>  2. Patch 12: synchronize late microcode loading
>>  3. Patch 13: support parallel microcodes update on different cores
>>  4. Patch 14: block #NMI handling during microcode loading
>>  5. Patch 15: disable late ucode loading due to BDF90
>>  6. Patch 16: call wbinvd() conditionally
>
>I don't know why it didn't occur to me earlier, but what about
>parked / offlined CPUs? They'll have their ucode updated when they
>get brought back online, but until then their ucode will disagree
>with that of the online CPUs. For truly offline CPUs this may be
>fine, but parked ones should probably be updated, perhaps via the
>same approach as used when C-state data becomes available (see
>set_cx_pminfo())?

Yes. It provides a means to wake up the parked CPU and a chance to run
some code (like loading ucode). But parked CPUs are cleared from
sibling info and cpu_online_map (see __cpu_disable()). If parallel
ucode loading is expected on parked CPUs, we should be able to
determine the primary threads and the number of cores no matter it is
online or parked. To this end, a new sibling map should be maintained
for each CPU and this map isn't changed when a CPU gets parked.

In Linux kernel, the approach is quite simple: late loading is
prohibited if any CPU is parked; admin should online all parked CPU
before loading ucode.

Do you have any preference?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 15/16] microcode: disable late loading if CPUs are affected by BDF90

2019-09-17 Thread Chao Gao

On Fri, Sep 13, 2019 at 11:22:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -283,6 +284,27 @@ static enum microcode_match_result compare_patch(
>>   : OLD_UCODE;
>>  }
>>  
>> +static bool is_blacklisted(void)
>> +{
>> +struct cpuinfo_x86 *c = ¤t_cpu_data;
>> +uint64_t llc_size = c->x86_cache_size * 1024ULL;
>> +struct cpu_signature *sig = &this_cpu(cpu_sig);
>> +
>> +do_div(llc_size, c->x86_max_cores);
>> +
>> +/*
>> + * Late loading on model 79 with microcode revision less than 0x0b21
>> + * and LLC size per core bigger than 2.5MB may result in a system hang.
>> + * This behavior is documented in item BDF90, #334165 (Intel Xeon
>> + * Processor E7-8800/4800 v4 Product Family).
>> + */
>> +if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
>> + llc_size > 2621440 && sig->rev < 0x0b21 )
>> +return true;
>> +
>> +return false;
>> +}
>
>Isn't this misbehavior worked around by the wbinvd() you add in the next
>patch?

Hi Jan and Andrew,

Perhaps I misunderstood what I was told. I am confirming with Ashok
whether this patch is necessary.

>
>> --- a/xen/include/asm-x86/microcode.h
>> +++ b/xen/include/asm-x86/microcode.h
>> @@ -30,6 +30,7 @@ struct microcode_ops {
>>  bool (*match_cpu)(const struct microcode_patch *patch);
>>  enum microcode_match_result (*compare_patch)(
>>  const struct microcode_patch *new, const struct microcode_patch 
>> *old);
>> +bool (*is_blacklisted)(void);
>
>Why a hook rather than a boolean flag, which could be set by
>microcode_update_one() (as invoked during AP bringup)?

How about set the boolean flag in Intel_errata_workarounds?

One limitation of setting the flag in microcode_update_one() is:
BSP also calls microcode_update_one(). But calculating LLC size per
core on BSP would meet the same issue as the following patch
(i.e. patch 16/16): BSP's current_cpu_data isn't initialized
properly. We might need to revert commit f97838bbd980a01 in
some way and reenumerate features after ucode loading is done.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated

2019-09-23 Thread Chao Gao

On Wed, Sep 18, 2019 at 02:16:13PM -0700, Joe Jin wrote:
>On 9/16/19 11:48 PM, Jan Beulich wrote:
>> On 17.09.2019 00:20, Joe Jin wrote:
>>> On 9/16/19 1:01 AM, Jan Beulich wrote:
 On 13.09.2019 18:38, Joe Jin wrote:
> On 9/13/19 12:14 AM, Jan Beulich wrote:
>> On 12.09.2019 20:03, Joe Jin wrote:
>>> --- a/xen/drivers/passthrough/io.c
>>> +++ b/xen/drivers/passthrough/io.c
>>> @@ -412,6 +412,9 @@ int pt_irq_create_bind(
>>>  pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
>>>  pirq_dpci->gmsi.gflags = gflags;
>>>  }
>>> +
>>> +if ( hvm_funcs.sync_pir_to_irr )
>>> +
>>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>>
>> If the need for this change can be properly explained, then it
>> still wants converting to alternative_vcall() - the the other
>> caller of this hook. Or perhaps even better move vlapic.c's
>> wrapper (suitably renamed) into hvm.h, and use it here.
>
> Yes I agree, I'm not 100% sure, so I set it to RFC.

 And btw, please also attach a brief comment here, to clarify
 why the syncing is needed precisely at this point.

>> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id
>> (right after your code insertion) allows for the field to be
>> invalid, which I think you need to guard against.
>
> I think you means multiple destination, then it's -1?

 The reason for why it might be -1 are irrelevant here, I think.
 You need to handle the case both to avoid an out-of-bounds
 array access and to make sure an IRR bit wouldn't still get
 propagated too late in some special case.
>>>
>>> Add following checks?
>>> if ( dest_vcpu_id >= 0 && dest_vcpu_id < d->max_vcpus &&
>>>  d->vcpu[dest_vcpu_id]->runstate.state <= RUNSTATE_blocked )
>> 
>> Just the >= part should suffice; without an explanation I don't
>> see why you want the runstate check (which after all is racy
>> anyway afaict).
>> 
 Also - what about the respective other path in the function,
 dealing with PT_IRQ_TYPE_PCI and PT_IRQ_TYPE_MSI_TRANSLATE? It
 seems to me that there's the same chance of deferring IRR
 propagation for too long?
>>>
>>> This is possible, can you please help on how to get which vcpu associate 
>>> the IRQ?
>>> I did not found any helper on current Xen.
>> 
>> There's no such helper, I'm afraid. Looking at hvm_migrate_pirq()
>> and hvm_girq_dest_2_vcpu_id() I notice that the former does nothing
>> if pirq_dpci->gmsi.posted is set. Hence pirq_dpci->gmsi.dest_vcpu_id
>> isn't really used in this case (please double check), and so you may
>> want to update the field alongside setting pirq_dpci->gmsi.posted in
>> pt_irq_create_bind(), covering the multi destination case.
>> 
>> Your code addition still visible in context above may then want to
>> be further conditionalized upon iommu_intpost or (perhaps better)
>> pirq_dpci->gmsi.posted being set.
>> 
>
>Sorry this is new to me, and I have to study from code.
>Do you think below check cover all conditions?
>
>diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>index 4290c7c710..90c3da441d 100644
>--- a/xen/drivers/passthrough/io.c
>+++ b/xen/drivers/passthrough/io.c
>@@ -412,6 +412,10 @@ int pt_irq_create_bind(
> pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
> pirq_dpci->gmsi.gflags = gflags;
> }
>+
>+/* Notify guest of pending interrupts if necessary */
>+if ( dest_vcpu_id >= 0 && iommu_intpost && pirq_dpci->gmsi.posted 
>)

Hi Joe,

Do you enable vt-d posted interrupt in Xen boot options? I don't see
why it is specific to vt-d posted interrupt. If only CPU side posted
interrupt is enabled, it is also possible that interrupts are not
propagated from PIR to IRR in time.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 1/7] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-26 Thread Chao Gao

During late microcode loading, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). In general, for both
early loading on BSP and late loading, cpu_request_microcode() is
called first to get the matching microcode update contained by
the blob and then apply_microcode() is invoked explicitly on each
cpu in common code.

Given that all CPUs are supposed to have the same signature, parsing
microcode only needs to be done once. So cpu_request_microcode() is
also moved out of microcode_update_cpu().

In some cases (e.g. a broken bios), the system may have multiple
revisions of microcode update. So we would try to load a microcode
update as long as it covers current cpu. And if a cpu loads this patch
successfully, the patch would be stored into the patch cache.

Note that calling ->apply_microcode() itself doesn't require any
lock being held. But the parameter passed to it may be protected
by some locks. E.g. microcode_update_cpu() acquires microcode_mutex
to avoid microcode_cache being updated by others.

Signed-off-by: Chao Gao 
---
Changes in v11:
 - drop Roger's RB.
 - acquire microcode_mutex before checking whether microcode_cache is
 NULL
 - ignore -EINVAL which indicates a equal/newer ucode is already loaded.
 - free 'buffer' earlier to avoid goto clauses in microcode_update()

Changes in v10:
 - make microcode_update_cache static
 - raise an error if loading ucode failed with -EIO
 - ensure end_update_percpu() is called following a successful call of
 start_update()

Changes in v9:
 - remove the calling of ->compare_patch in microcode_update_cpu().
 - drop "microcode_" prefix for static function - microcode_parse_blob().
 - rebase and fix conflict

Changes in v8:
 - divide the original patch into three patches to improve readability
 - load an update on each cpu as long as the update covers current cpu
 - store an update after the first successful loading on a CPU
 - Make sure the current CPU (especially pf value) is covered
 by updates.

changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
---
 xen/arch/x86/microcode.c| 173 +++-
 xen/arch/x86/microcode_amd.c|  38 -
 xen/arch/x86/microcode_intel.c  |  66 +++
 xen/include/asm-x86/microcode.h |   5 +-
 4 files changed, 172 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b44e4d7..3ea2a6e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,12 +189,19 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return a patch that covers current CPU. If there are multiple patches,
+ * return the one with the highest revision number. Return error If no
+ * patch is found and an error occurs during the parsing process. Otherwise
+ * return NULL.
+ */
+static struct microcode_patch *parse_blob(const char *buf, size_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(&this_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
+
+return NULL;
+}
 
 int microcode_resume_cpu(void)
 {
@@ -220,15 +227,8 @@ void microcode_free_patch(struct microcode_patch 
*microcode_patch)
 xfree(microcode_patch);
 }
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(µcode_mutex));
-
-return microcode_cache;
-}
-
 /* Return true if cache gets updated. Otherwise, return false */
-bool microcode_update_cache(struct microcode_patch *patch)
+static bool microcode_update_cache(struct microcode_patch *patch)
 {
 ASSERT(spin_is_locked(µcode_mutex));
 
@@ -249,49 +249,82 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static int microcode_update_cpu(const void *buf, size_t size)
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(const struct microcode_patch *patch)
 {
-int err;
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
+int err = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
-spin_lock(µcode_mutex);
+if ( unlikely(err) )
+return err;
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(buf, size);
+spin_lock(µcode_mutex);
+if ( patch )
+er

[Xen-devel] [PATCH v11 0/7] improve late microcode loading

2019-09-26 Thread Chao Gao

Changes in v11:
 - reject late ucode loading if any core is parked
 - correct the usage of microcode_mutex in microcode_update_cpu()
 - extend 'ucode' boot option to enable/disable ucode loading in NMI
 - drop the last two patches of v10 (about BDF90 and wbinvd, I haven't
 get an answer to opens yet).
 - other minor changes are described in each patch's change log

Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 4 more details).

This series includes below changes:
 1. Patch 1-3: cleanup and preparation for synchronizing ucode loading
 2. Patch 4: synchronize late microcode loading
 3. Patch 5: support parallel microcodes update on different cores
 4. Patch 6: rendezvous CPUs in NMI handler and load ucode
 5. Patch 7: reject late ucode loading if any core is parked

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Previous change log:
Major changes in version 10:
 - add back the patch to call wbinvd() conditionally
 - add a patch to disable late loading due to BDF90
 - rendezvous CPUs in NMI handler and load ucode. But provide an option
 to disable this behavior.
 - avoid the call of self_nmi() on the control thread because it may
 trigger the unknown_nmi_error() in do_nmi().
 - ensure ->start_update is called during system resuming from
 suspension

Changes in version 9:
 - add Jan's Reviewed-by
 - rendevzous threads in NMI handler to disable NMI. Note that NMI can
 be served as usual on threads that are chosen to initiate ucode loading
 on each core.
 - avoid unnecessary memory allocation or copy when creating a microcode
 patch (patch 12)
 - rework patch 1 to avoid microcode_update_match() being used to
 compare two arbitrary updates.
 - call .end_update in early loading path.

Changes in version 8:
 - block #NMI handling during microcode loading (Patch 16)
 - Don't assume that all CPUs in the system have loaded a same ucode.
 So when parsing a blob, we attempt to save a patch as long as it matches
 with current cpu signature regardless of the revision of the patch.
 And also for loading, we only require the patch to be loaded isn't old
 than the cached one.
 - store an update after the first successful loading on a CPU
 - remove the patch that calls wbinvd() unconditionally before microcode
 loading. It is under internal discussion.
 - divide two big patches into several patches to improve readability.

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.

Chao Gao (7):
  microcode: split out apply_microcode() from cpu_request_microcode()
  microcode: unify ucode loading during system bootup and resuming
  microcode: reduce memory allocation and copy when creating a patch
  x86/microcode: Synchronize late microcode

[Xen-devel] [PATCH v11 3/7] microcode: reduce memory allocation and copy when creating a patch

2019-09-26 Thread Chao Gao

To create a microcode patch from a vendor-specific update,
allocate_microcode_patch() copied everything from the update.
It is not efficient. Essentially, we just need to go through
ucodes in the blob, find the one with the newest revision and
install it into the microcode_patch. In the process, buffers
like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
side) can be reused. microcode_patch now is allocated after
it is sure that there is a matching ucode.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes in v11:
 - correct parameter type issues of get_next_ucode_from_buffer

Changes in v10:
 - avoid unnecessary type casting
   * introduce compare_header on AMD side
   * specify the type of the first parameter of get_next_ucode_from_buffer()
 on Intel side

Changes in v9:
 - new
---
 xen/arch/x86/microcode_amd.c   | 112 +
 xen/arch/x86/microcode_intel.c |  67 +---
 2 files changed, 69 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 0199308..9a8f179 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -194,36 +194,6 @@ static bool match_cpu(const struct microcode_patch *patch)
 return patch && (microcode_fits(patch->mc_amd) == NEW_UCODE);
 }
 
-static struct microcode_patch *alloc_microcode_patch(
-const struct microcode_amd *mc_amd)
-{
-struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
-struct microcode_amd *cache = xmalloc(struct microcode_amd);
-void *mpb = xmalloc_bytes(mc_amd->mpb_size);
-struct equiv_cpu_entry *equiv_cpu_table =
-xmalloc_bytes(mc_amd->equiv_cpu_table_size);
-
-if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
-{
-xfree(microcode_patch);
-xfree(cache);
-xfree(mpb);
-xfree(equiv_cpu_table);
-return ERR_PTR(-ENOMEM);
-}
-
-memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
-cache->mpb = mpb;
-cache->mpb_size = mc_amd->mpb_size;
-memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
-   mc_amd->equiv_cpu_table_size);
-cache->equiv_cpu_table = equiv_cpu_table;
-cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
-microcode_patch->mc_amd = cache;
-
-return microcode_patch;
-}
-
 static void free_patch(void *mc)
 {
 struct microcode_amd *mc_amd = mc;
@@ -236,6 +206,17 @@ static void free_patch(void *mc)
 }
 }
 
+static enum microcode_match_result compare_header(
+const struct microcode_header_amd *new_header,
+const struct microcode_header_amd *old_header)
+{
+if ( new_header->processor_rev_id == old_header->processor_rev_id )
+return (new_header->patch_id > old_header->patch_id) ? NEW_UCODE
+ : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 static enum microcode_match_result compare_patch(
 const struct microcode_patch *new, const struct microcode_patch *old)
 {
@@ -246,11 +227,7 @@ static enum microcode_match_result compare_patch(
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 
-if ( new_header->processor_rev_id == old_header->processor_rev_id )
-return (new_header->patch_id > old_header->patch_id) ?
-NEW_UCODE : OLD_UCODE;
-
-return MIS_UCODE;
+return compare_header(new_header, old_header);
 }
 
 static int apply_microcode(const struct microcode_patch *patch)
@@ -328,18 +305,10 @@ static int get_ucode_from_buffer_amd(
 return -EINVAL;
 }
 
-if ( mc_amd->mpb_size < mpbuf->len )
-{
-if ( mc_amd->mpb )
-{
-xfree(mc_amd->mpb);
-mc_amd->mpb_size = 0;
-}
-mc_amd->mpb = xmalloc_bytes(mpbuf->len);
-if ( mc_amd->mpb == NULL )
-return -ENOMEM;
-mc_amd->mpb_size = mpbuf->len;
-}
+mc_amd->mpb = xmalloc_bytes(mpbuf->len);
+if ( !mc_amd->mpb )
+return -ENOMEM;
+mc_amd->mpb_size = mpbuf->len;
 memcpy(mc_amd->mpb, mpbuf->data, mpbuf->len);
 
 pr_debug("microcode: CPU%d size %zu, block size %u offset %zu equivID %#x 
rev %#x\n",
@@ -459,8 +428,9 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
  size_t bufsize)
 {
 struct microcode_amd *mc_amd;
+struct microcode_header_amd *saved = NULL;
 struct microcode_patch *patch = NULL;
-size_t offset = 0;
+size_t offset = 0, saved_size = 0;
 int error = 0;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
@@ -550,29 +520,22 @@ static struct microcode_patch 
*cpu_request_microcode(const void *buf,

[Xen-devel] [PATCH v11 4/7] x86/microcode: Synchronize late microcode loading

2019-09-26 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v11:
 - Use the sample code of wait_for_state() provided by Jan
 - make wait_cpu_call{in,out} take unsigned int to avoid type casting
 - do assignment in while clause in control_thread_fn() to eliminate
 duplication.

Changes in v10:
 - introduce wait_for_state() and set_state() helper functions
 - make wait_for_condition() return bool and take const void *
 - disable/enable watchdog in control thread
 - rename "master" and "slave" thread to "primary" and "secondary"

Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
 ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
 there are two places for a cpu to call in. bitmap won't be counted
 twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
 update. We have to do this because some threads may be stuck in NMI
 handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 297 ++-
 1 file changed, 267 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 9c0e5c4..6c23879 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+LOADING_PREPARE,
+LOADING_CALLIN,
+LOADING_ENTER,
+LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers curre

[Xen-devel] [PATCH v11 7/7] microcode: reject late ucode loading if any core is parked

2019-09-26 Thread Chao Gao

If a core with all of its thread being parked, late ucode loading
which currently only loads ucode on online threads would lead to
differing ucode revisions in the system. In general, keeping ucode
revision consistent would be less error-prone. To this end, if there
is a parked thread doesn't have an online sibling thread, late ucode
loading is rejected.

Two threads are on the same core or computing unit iff they have
the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
is generated for each thread. And use a bitmap to reduce the
number of comparison.

Signed-off-by: Chao Gao 
---
Alternatively, we can mask the thread id off apicid and use it
as the unique core id. It needs to introduce new field in cpuinfo_x86
to record the mask for thread id. So I don't take this way.
---
 xen/arch/x86/microcode.c| 75 +
 xen/include/asm-x86/processor.h |  1 +
 2 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b9fa8bb..b70eb16 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -573,6 +573,64 @@ static int do_microcode_update(void *patch)
 return ret;
 }
 
+static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
+{
+unsigned int core_id = cpu_to_cu(cpu);
+
+if ( core_id == INVALID_CUID )
+core_id = cpu_to_core(cpu);
+
+return (cpu_to_socket(cpu) << socket_shift) + core_id;
+}
+
+static int has_parked_core(void)
+{
+int ret = 0;
+
+if ( park_offline_cpus )
+{
+unsigned int cpu, max_bits, core_width;
+unsigned int max_sockets = 1, max_cores = 1;
+struct cpuinfo_x86 *c = cpu_data;
+unsigned long *bitmap;
+
+for_each_present_cpu(cpu)
+{
+if ( x86_cpu_to_apicid[cpu] == BAD_APICID )
+continue;
+
+/* Note that cpu_to_socket() get an ID starting from 0. */
+if ( cpu_to_socket(cpu) + 1 > max_sockets )
+max_sockets = cpu_to_socket(cpu) + 1;
+
+if ( c[cpu].x86_max_cores > max_cores )
+max_cores = c[cpu].x86_max_cores;
+}
+
+core_width = fls(max_cores);
+max_bits = max_sockets << core_width;
+bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(max_bits));
+if ( !bitmap )
+return -ENOMEM;
+
+for_each_present_cpu(cpu)
+{
+if ( cpu_online(cpu) || x86_cpu_to_apicid[cpu] == BAD_APICID )
+continue;
+
+__set_bit(unique_core_id(cpu, core_width), bitmap);
+}
+
+for_each_online_cpu(cpu)
+__clear_bit(unique_core_id(cpu, core_width), bitmap);
+
+ret = (find_first_bit(bitmap, max_bits) < max_bits);
+xfree(bitmap);
+}
+
+return ret;
+}
+
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
 int ret;
@@ -611,6 +669,23 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
  */
 ASSERT(cpumask_first(&cpu_online_map) == nmi_cpu);
 
+/*
+ * If there is a core with all of its threads parked, late loading may
+ * cause differing ucode revisions in the system. Refuse this operation.
+ */
+ret = has_parked_core();
+if ( ret )
+{
+if ( ret > 0 )
+{
+printk(XENLOG_WARNING
+   "Ucode loading aborted: found a parked core\n");
+ret = -EPERM;
+}
+xfree(buffer);
+goto put;
+}
+
 patch = parse_blob(buffer, len);
 xfree(buffer);
 if ( IS_ERR(patch) )
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index c92956f..753deec 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -171,6 +171,7 @@ extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 
*c);
 
 #define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
 #define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+#define cpu_to_cu(_cpu) (cpu_data[_cpu].compute_unit_id)
 
 unsigned int apicid_to_socket(unsigned int);
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-26 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI and interacting with the old
ucode not in a controlled way while other threads are loading ucode.
Update ucode on the control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli 
Signed-off-by: Chao Gao 
---
Changes in v11:
 - Extend existing 'nmi' option rather than use a new one.
 - use per-cpu variable to store error code of xxx_nmi_work()
 - rename secondary_thread_work to secondary_nmi_work.
 - intialize nmi_patch to ZERO_BLOCK_PTR and make it static.
 - constify nmi_cpu
 - explain why control thread loads ucode first in patch description

Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc |   6 +-
 xen/arch/x86/microcode.c  | 156 ++
 xen/arch/x86/traps.c  |   6 +-
 xen/include/asm-x86/nmi.h |   3 +
 4 files changed, 138 insertions(+), 33 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 832797e..8beb285 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2036,7 +2036,7 @@ pages) must also be specified via the tbuf_size parameter.
 > `= unstable | skewed | stable:socket`
 
 ### ucode (x86)
-> `= [ | scan]`
+> `= List of [  | scan, nmi= ]`
 
 Specify how and where to find CPU microcode update blob.
 
@@ -2057,6 +2057,10 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+'nmi' determines late loading is performed in NMI handler or just in
+stop_machine context. In NMI handler, even NMIs are blocked, which is
+considered safer. The default value is `true`.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 6c23879..b9fa8bb 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,6 +97,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* By default, ucode loading is done in NMI handler */
+static bool ucode_in_nmi = true;
+
 /* Protected by microcode_mutex */
 static struct microcode_patch *microcode_cache;
 
@@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
 }
 
 /*
- * The format is '[|scan]'. Both options are optional.
+ * The format is '[|scan, nmi=]'. Both options are optional.
  * If the EFI has forced which of the multiboot payloads is to be used,
- * no parsing will be attempted.
+ * only nmi= is parsed.
  */
 static int __init parse_ucode(const char *s)
 {
-const char *q = NULL;
+const char *ss;
+int val, rc = 0;
 
-if ( ucode_mod_forced ) /* Forced by EFI */
-   return 0;
+do {
+ss = strchr(s, ',');
+if ( !ss )
+ss = strchr(s, '\0');
 
-if ( !strncmp(s, "scan", 4) )
-ucode_scan = 1;
-else
-ucode_mod_idx = simple_strtol(s, &q, 0);
+if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
+ucode_in_nmi = val;
+else if ( !ucode_mod_forced ) /* Not forced by EFI */
+{
+const char *q = NULL;
+
+if ( !strncmp(s, "scan", 4) )
+{
+ucode_scan = true;
+q = s + 4;
+}
+else
+ucode_mod_idx

[Xen-devel] [PATCH v11 2/7] microcode: unify ucode loading during system bootup and resuming

2019-09-26 Thread Chao Gao

During system bootup and resuming, CPUs just load the cached ucode.
So one unified function microcode_update_one() is introduced. It
takes a boolean to indicate whether ->start_update should be called.
Since early_microcode_update_cpu() is only called on BSP (APs call
the unified function), start_update is always true and so remove
this parameter.

There is a functional change: ->start_update is called on BSP and
->end_update_percpu is called during system resuming. They are not
invoked by previous microcode_resume_cpu().

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v10:
 - call ->start_update for system resume from suspension

Changes in v9:
 - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
   microcode_update_one()
 - rebase and fix conflicts.

Changes in v8:
 - split out from the previous patch
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 91 +++--
 xen/arch/x86/smpboot.c  |  5 +--
 xen/include/asm-x86/processor.h |  4 +-
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 269b140..01e6aec 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+microcode_update_one(true);
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 3ea2a6e..9c0e5c4 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char *buf, 
size_t len)
 return NULL;
 }
 
-int microcode_resume_cpu(void)
-{
-int err;
-struct cpu_signature *sig = &this_cpu(cpu_sig);
-
-if ( !microcode_ops )
-return 0;
-
-spin_lock(µcode_mutex);
-
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode(microcode_cache);
-spin_unlock(µcode_mutex);
-
-return err;
-}
-
 void microcode_free_patch(struct microcode_patch *microcode_patch)
 {
 microcode_ops->free_patch(microcode_patch->mc);
@@ -391,11 +373,38 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-int __init early_microcode_update_cpu(bool start_update)
+/* Load a cached update to current cpu */
+int microcode_update_one(bool start_update)
+{
+int err;
+
+if ( !microcode_ops )
+return -EOPNOTSUPP;
+
+microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+
+if ( start_update && microcode_ops->start_update )
+{
+err = microcode_ops->start_update();
+if ( err )
+return err;
+}
+
+err = microcode_update_cpu(NULL);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return err;
+}
+
+/* BSP calls this function to parse ucode blob and then apply an update. */
+int __init early_microcode_update_cpu(void)
 {
 int rc = 0;
 void *data = NULL;
 size_t len;
+struct microcode_patch *patch;
 
 if ( !microcode_ops )
 return -ENOSYS;
@@ -411,44 +420,26 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(&ucode_mod);
 }
 
-microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
-
 if ( !data )
 return -ENOMEM;
 
-if ( start_update )
+patch = parse_blob(data, len);
+if ( IS_ERR(patch) )
 {
-struct microcode_patch *patch;
-
-patch = parse_blob(data, len);
-if ( IS_ERR(patch) )
-{
-printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
-   PTR_ERR(patch));
-return PTR_ERR(patch);
-}
-
-if ( !patch )
-return -ENOENT;
-
-spin_lock(µcode_mutex);
-rc = microcode_update_cache(patch);
-spin_unlock(µcode_mutex);
-ASSERT(rc);
-
-if ( microcode_ops->start_update )
-rc = microcode_ops->start_update();
-
-if ( rc )
-return rc;
+printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
+   PTR_ERR(patch));
+return PTR_ERR(patch);
 }
 
-rc = microcode_update_cpu(NULL);
+if ( !patch )
+return -ENOENT;
 
-if ( microcode_ops->end_update_percpu )
-microcode_ops->end_update_percpu();
+spin_lock(µcode_mutex);
+rc = microcode_update_cache(patch);
+spin_unlock(µcode_mutex);
+ASSERT(rc);
 
-return rc;
+return microcode_update_one(true);
 }
 
 int __init early_microcode_init(void)
@@ -468,7 +459,7 @@ int __init early_microcode_init(void)
 microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));

[Xen-devel] [PATCH v11 5/7] microcode: remove microcode_update_lock

2019-09-26 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 9a8f179..1e52f7f 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -232,7 +229,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -247,15 +243,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index c083e17..9ededcc 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -287,7 +284,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -302,8 +298,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
@@ -316,7 +311,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

On Fri, Sep 27, 2019 at 12:19:22PM +0200, Jan Beulich wrote:
>On 26.09.2019 15:53, Chao Gao wrote:
>> @@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
>>  }
>>  
>>  /*
>> - * The format is '[|scan]'. Both options are optional.
>> + * The format is '[|scan, nmi=]'. Both options are optional.
>>   * If the EFI has forced which of the multiboot payloads is to be used,
>> - * no parsing will be attempted.
>> + * only nmi= is parsed.
>>   */
>>  static int __init parse_ucode(const char *s)
>>  {
>> -const char *q = NULL;
>> +const char *ss;
>> +int val, rc = 0;
>>  
>> -if ( ucode_mod_forced ) /* Forced by EFI */
>> -   return 0;
>> +do {
>> +ss = strchr(s, ',');
>> +if ( !ss )
>> +ss = strchr(s, '\0');
>>  
>> -if ( !strncmp(s, "scan", 4) )
>> -ucode_scan = 1;
>> -else
>> -ucode_mod_idx = simple_strtol(s, &q, 0);
>> +if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
>> +ucode_in_nmi = val;
>> +else if ( !ucode_mod_forced ) /* Not forced by EFI */
>> +{
>> +const char *q = NULL;
>> +
>> +if ( !strncmp(s, "scan", 4) )
>> +{
>> +ucode_scan = true;
>
>I guess it would have resulted in more consistent code if you had
>used parse_boolean() here, too.
>
>> @@ -222,6 +246,8 @@ const struct microcode_ops *microcode_ops;
>>  static DEFINE_SPINLOCK(microcode_mutex);
>>  
>>  DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>> +/* Store error code of the work done in NMI handler */
>> +DEFINE_PER_CPU(int, loading_err);
>
>static
>
>> @@ -356,42 +383,88 @@ static void set_state(unsigned int state)
>>  smp_wmb();
>>  }
>>  
>> -static int secondary_thread_fn(void)
>> +static int secondary_nmi_work(void)
>>  {
>> -unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>>  
>> -if ( !wait_for_state(LOADING_CALLIN) )
>> -return -EBUSY;
>> +return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
>> +}
>> +
>> +static int primary_thread_work(const struct microcode_patch *patch)
>> +{
>> +int ret;
>>  
>>  cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>>  
>> -if ( !wait_for_state(LOADING_EXIT) )
>> +if ( !wait_for_state(LOADING_ENTER) )
>>  return -EBUSY;
>>  
>> -/* Copy update revision from the primary thread. */
>> -this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
>> +ret = microcode_ops->apply_microcode(patch);
>> +if ( !ret )
>> +atomic_inc(&cpu_updated);
>> +atomic_inc(&cpu_out);
>>  
>> -return 0;
>> +return ret;
>>  }
>>  
>> -static int primary_thread_fn(const struct microcode_patch *patch)
>> +static int primary_nmi_work(const struct microcode_patch *patch)
>> +{
>> +return primary_thread_work(patch);
>> +}
>
>Why this wrapper? The function signatures are identical. I guess
>you want to emphasize the environment the function is to be used
>in, so perhaps fine despite the redundancy. At least there's no
>address taken of this function, so the compiler can eliminate it.
>
>> +static int secondary_thread_fn(void)
>> +{
>>  if ( !wait_for_state(LOADING_CALLIN) )
>>  return -EBUSY;
>>  
>> -cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>> +self_nmi();
>>  
>> -if ( !wait_for_state(LOADING_ENTER) )
>> +/* Copy update revision from the primary thread. */
>> +this_cpu(cpu_sig).rev =
>> +per_cpu(cpu_sig, cpumask_first(this_cpu(cpu_sibling_mask))).rev;
>
>_alternative_instructions() takes specific care to avoid relying on
>the NMI potentially not arriving synchronously (in which case you'd
>potentially copy a not-yet-updated CPU signature above). I think the
>same care wants applying here, which I guess would be another
>
>wait_for_state(LOADING_EXIT);
>
>> +return this_cpu(loading_err);
>> +}
>> +
>> +static int primary_thread_fn(const struct microcode_patch *patch)
>> +{
>> +if ( !wait_for_state(LOADING_CALLIN) )
>>  return -EBUSY;
>>  
>> -ret = microcode_ops->apply_microcode(patch);
>> -if ( !ret )
>>

Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

On Fri, Sep 27, 2019 at 03:55:00PM +0200, Jan Beulich wrote:
>On 27.09.2019 15:53, Chao Gao wrote:
>> On Fri, Sep 27, 2019 at 12:19:22PM +0200, Jan Beulich wrote:
>>> On 26.09.2019 15:53, Chao Gao wrote:
>>>> @@ -420,14 +498,23 @@ static int control_thread_fn(const struct 
>>>> microcode_patch *patch)
>>>>  return ret;
>>>>  }
>>>>  
>>>> -/* Let primary threads load the given ucode update */
>>>> -set_state(LOADING_ENTER);
>>>> -
>>>> +/* Control thread loads ucode first while others are in NMI handler. 
>>>> */
>>>>  ret = microcode_ops->apply_microcode(patch);
>>>>  if ( !ret )
>>>>  atomic_inc(&cpu_updated);
>>>>  atomic_inc(&cpu_out);
>>>>  
>>>> +if ( ret == -EIO )
>>>> +{
>>>> +printk(XENLOG_ERR
>>>> +   "Late loading aborted: CPU%u failed to update ucode\n", 
>>>> cpu);
>>>> +set_state(LOADING_EXIT);
>>>> +return ret;
>>>> +}
>>>> +
>>>> +/* Let primary threads load the given ucode update */
>>>> +set_state(LOADING_ENTER);
>>>
>>> While the description goes to some lengths to explain this ordering of
>>> updates, I still don't really see the point: How is it better for the
>>> control CPU to have updated its ucode early and then hit an NMI before
>>> the other CPUs have even started updating, than the other way around
>>> in the opposite case?
>> 
>> We want to be conservative here. If an ucode is to update something
>> shared by a whole socket, for the latter case, control thread may
>> be accessing things that are being updating by the ucode loading on
>> other cores. It is not safe, just like sibling thread isn't expected
>> to access features exposed by the old ucode when primary thread is
>> loading ucode.
>
>Ah yes, considering a socket-wide effect didn't occur to me (although
>it should have). So if you mention this aspect in the description, I
>think I'm going to be fine with the change in this regard. Yet (as so
>often) this raises another question: What about "secondary" sockets?
>Shouldn't we entertain a similar two-step approach there then?

No. The two-step approach is because control thread cannot call
self_nmi() in case of triggering unknown_nmi_error() and what is done
in the main NMI handler isn't well controlled. All cores on other
sockets will rendezvous in NMI handler. It means every core's behavior
on other sockets is well controlled.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI while other threads are loading
ucode. If an ucode is to update something shared by a whole socket,
control thread may be accessing things that are being updating by the
ucode loading on other cores. It is not safe. Update ucode on the
control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli 
Signed-off-by: Chao Gao 
---
Note:
I plan to finish remaining patches (like handling parked CPU,
BDF90 and WBINVD, IMO, not important as this one) in RCs.
So this v12 only has one patch.

Changes in v12:
 - take care that self NMI may not arrive synchronously.
 - explain why control thread loads ucode first in patch description.
 - use parse_boolean to parse "scan" field in "ucode" option. The change
 is compatible with the old style.
 - staticify loading_err
 - drop primary_nmi_work()

Changes in v11:
 - Extend existing 'nmi' option rather than use a new one.
 - use per-cpu variable to store error code of xxx_nmi_work()
 - rename secondary_thread_work to secondary_nmi_work.
 - intialize nmi_patch to ZERO_BLOCK_PTR and make it static.
 - constify nmi_cpu
 - explain why control thread loads ucode first in patch description

Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc |   6 +-
 xen/arch/x86/microcode.c  | 174 +++---
 xen/arch/x86/traps.c  |   6 +-
 xen/include/asm-x86/nmi.h |   3 +
 4 files changed, 156 insertions(+), 33 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index fc64429..f5410b3 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2053,7 +2053,7 @@ pages) must also be specified via the tbuf_size parameter.
 > `= unstable | skewed | stable:socket`
 
 ### ucode (x86)
-> `= [ | scan]`
+> `= List of [  | scan=, nmi= ]`
 
 Specify how and where to find CPU microcode update blob.
 
@@ -2074,6 +2074,10 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+'nmi' determines late loading is performed in NMI handler or just in
+stop_machine context. In NMI handler, even NMIs are blocked, which is
+considered safer. The default value is `true`.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b882ac8..3c0f72e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,6 +97,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* By default, ucode loading is done in NMI handler */
+static bool ucode_in_nmi = true;
+
 /* Protected by microcode_mutex */
 static struct microcode_patch *microcode_cache;
 
@@ -105,23 +110,40 @@ void __init microcode_set_module(unsigned int idx)
 }
 
 /*
- * The format is '[|scan]'. Both options are optional.
- * If the EFI has forced which of the multiboot payloads is to be used,
- * no parsing will be attempted.
+ * The format is '[|scan=, nmi=]'. Both options are
+ * optional. If the EFI has forced which of the multiboot payloads is to be
+ * used, only nmi= is parsed.
  */
 static int __init parse_ucode(const char *s)
 {
-const char *q = NULL;
+const char *ss;
+int val, rc = 0;
 
-if ( ucode_mod_forced ) /* Forced by EFI

[Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-29 Thread Chao Gao

Current, Xen isn't aware of device reset (initiated by dom0). Xen may
access the device while device cannot respond to config requests
normally (e.g.  after device reset, device may respond to config
requests with CRS completions to indicate it needs more time to
complete a reset, refer to pci_dev_wait() in linux kernel for more
detail). Here, don't assume msix capability is always visible and
return -EAGAIN to the caller.

Signed-off-by: Chao Gao 
---
I didn't find a way to trigger the assertion in normal usages.
It is found by an internal test: echo 1 to /sys/bus/pci//reset
when the device is being used by a guest. Although the test is a
little insane, it is better to avoid crashing Xen even for this case.
---
 xen/arch/x86/msi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 76d4034..e2f3c6c 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev *pdev, 
unsigned int reg,
 pos = entry ? entry->msi_attrib.pos
 : pci_find_cap_offset(seg, bus, slot, func,
   PCI_CAP_ID_MSIX);
-ASSERT(pos);
+if ( unlikely(!pos) )
+{
+printk_once(XENLOG_WARNING
+"%04x:%02x:%02x.%u MSI-X capability is missing\n",
+seg, bus, slot, func);
+return -EAGAIN;
+}
 
 if ( reg >= pos && reg < msix_pba_offset_reg(pos) + 4 )
 {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 7/7] microcode: reject late ucode loading if any core is parked

2019-09-29 Thread Chao Gao

On Fri, Sep 27, 2019 at 01:19:16PM +0200, Jan Beulich wrote:
>On 26.09.2019 15:53, Chao Gao wrote:
>> If a core with all of its thread being parked, late ucode loading
>> which currently only loads ucode on online threads would lead to
>> differing ucode revisions in the system. In general, keeping ucode
>> revision consistent would be less error-prone. To this end, if there
>> is a parked thread doesn't have an online sibling thread, late ucode
>> loading is rejected.
>> 
>> Two threads are on the same core or computing unit iff they have
>> the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
>> phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
>> is generated for each thread. And use a bitmap to reduce the
>> number of comparison.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> Alternatively, we can mask the thread id off apicid and use it
>> as the unique core id. It needs to introduce new field in cpuinfo_x86
>> to record the mask for thread id. So I don't take this way.
>
>It feels a little odd that you introduce a "custom" ID, but it
>should be fine without going this alternative route. (You
>wouldn't need a new field though, I think, as we've got the
>x86_num_siblings one already.)
>
>What I continue to be unconvinced of is for the chosen approach
>to be better than briefly unparking a thread on each core, as
>previously suggested.

It isn't so easy to go the same way as set_cx_pminfo().

1. NMI handler on parked threads is changed to a nop. To load ucode in
NMI handler, we have to switch back to normal NMI handler in
default_idle(). But it conflicts with what the comments in play_dead()
implies: it is not safe to call normal NMI handler after
cpu_exit_clear().

2. A precondition of unparking a thread on each core, we need to find
out exactly all parked cores and wake up one thread of each of them.
Then in theory, what this patch does is only part of unparking a thread
on each core.

I don't mean they are hard to address. But we need to take care of them.
Given that, IMO, this patch is much straightforward.

>
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -573,6 +573,64 @@ static int do_microcode_update(void *patch)
>>  return ret;
>>  }
>>  
>> +static unsigned int unique_core_id(unsigned int cpu, unsigned int 
>> socket_shift)
>> +{
>> +unsigned int core_id = cpu_to_cu(cpu);
>> +
>> +if ( core_id == INVALID_CUID )
>> +core_id = cpu_to_core(cpu);
>> +
>> +return (cpu_to_socket(cpu) << socket_shift) + core_id;
>> +}
>> +
>> +static int has_parked_core(void)
>> +{
>> +int ret = 0;
>
>I don't think you need the initializer here.
>
>> +if ( park_offline_cpus )
>
>if ( !park_offline_cpus )
>return 0;
>
>would allow one level less of indentation of the main part of
>the function body.
>
>> +{
>> +unsigned int cpu, max_bits, core_width;
>> +unsigned int max_sockets = 1, max_cores = 1;
>> +struct cpuinfo_x86 *c = cpu_data;
>> +unsigned long *bitmap;
>
+
>> +for_each_present_cpu(cpu)
>> +{
>> +if ( x86_cpu_to_apicid[cpu] == BAD_APICID )
>> +continue;
>> +
>> +/* Note that cpu_to_socket() get an ID starting from 0. */
>> +if ( cpu_to_socket(cpu) + 1 > max_sockets )
>
>Instead of "+ 1", why not >= ?
>
>> +max_sockets = cpu_to_socket(cpu) + 1;
>> +
>> +if ( c[cpu].x86_max_cores > max_cores )
>> +max_cores = c[cpu].x86_max_cores;
>
>What guarantees .x86_max_cores to be valid? Onlining a hot-added
>CPU is a two step process afaict, XENPF_cpu_hotadd followed by
>XENPF_cpu_online. In between the CPU would be marked present
>(and cpu_add() would also have filled x86_cpu_to_apicid[cpu]),
>but cpu_data[cpu] wouldn't have been filled yet afaict. This
>also makes the results of the cpu_to_*() unreliable that you use
>in unique_core_id().

Indeed. I agree.

>
>However, if we assume sufficient similarity between CPU
>packages (as you've done elsewhere in this series iirc), this

Yes.

>may not be an actual problem. But it wants mentioning in a code
>comment, I think. Plus at the very least you depend on the used
>cpu_data[] fields to not contain unduly large values (and hence
>you e.g. depend on cpu_data[] not gaining an initializer,
>setting the three fields of interest to their INVALID_* values,
>as currently done by identify_cpu()).

Can we skip those threads whose socket ID is invalid and initialize
the three fields in cpu_add()?
Or maintain a bitmap for parked threads to help distinguish them from
real offlined threads, and go through parked threads here?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-30 Thread Chao Gao

On Mon, Sep 30, 2019 at 11:09:58AM +0200, Roger Pau Monné wrote:
>On Mon, Sep 30, 2019 at 05:24:31AM +0800, Chao Gao wrote:
>> Current, Xen isn't aware of device reset (initiated by dom0). Xen may
>> access the device while device cannot respond to config requests
>> normally (e.g.  after device reset, device may respond to config
>> requests with CRS completions to indicate it needs more time to
>> complete a reset, refer to pci_dev_wait() in linux kernel for more
>> detail). Here, don't assume msix capability is always visible and
>> return -EAGAIN to the caller.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> I didn't find a way to trigger the assertion in normal usages.
>> It is found by an internal test: echo 1 to /sys/bus/pci//reset
>> when the device is being used by a guest. Although the test is a
>> little insane, it is better to avoid crashing Xen even for this case.
>
>The hardware domain doing such things behind Xen's back is quite
>likely to end badly, either hitting an ASSERT somewhere or with a
>malfunctioning device. Xen should be signaled of when such reset is
>happening, so it can also tear down the internal state of the
>device.
>
>Xen could trap accesses to the FLR bit in order to detect device
>resets, but that's only a way of performing a device reset, other
>methods are likely more complicated to detect, and hence this would
>only be a partial solution.
>
>Have you considered whether it's feasible to signal Xen that a device
>reset is happening, so it can torn down the internal device state?

I think it is feasible. But I am not sure whether it is necessary.
As you said to me before, after detaching the device from a domain,
the internal device state in Xen should have be reset. That's why
hardware domain or other domainU can use the device again. So Xen
has provided hypercalls to tear down the internal state. (IMO, the
internal state includes interrupt binding and mapping, MMIO mapping.
But I am not sure if I miss something).

The question then becomes: should Xen tolerate hardware domain's
misbehavior (resetting a device without tearing down internal state)
or just panic?

>
>> ---
>>  xen/arch/x86/msi.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 76d4034..e2f3c6c 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev 
>> *pdev, unsigned int reg,
>>  pos = entry ? entry->msi_attrib.pos
>>  : pci_find_cap_offset(seg, bus, slot, func,
>>PCI_CAP_ID_MSIX);
>> -ASSERT(pos);
>
>I think at least a comment should be added here describing why a
>capability might suddenly disappear.

Will do.

>
>> +if ( unlikely(!pos) )
>> +{
>> +printk_once(XENLOG_WARNING
>
>I'm not sure if printk_once is the best option, the message would be
>printed only once, and for the first device that hits this. Ideally I
>think it should be printed at least once for each device that hits
>this condition.
>
>Alternatively you can turn this into a gprintk which would be good
>enough IMO.

Will do.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-30 Thread Chao Gao

On Mon, Sep 30, 2019 at 11:18:05AM +0200, Jan Beulich wrote:
>On 29.09.2019 23:24, Chao Gao wrote:
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev 
>> *pdev, unsigned int reg,
>>  pos = entry ? entry->msi_attrib.pos
>>  : pci_find_cap_offset(seg, bus, slot, func,
>>PCI_CAP_ID_MSIX);
>> -ASSERT(pos);
>> +if ( unlikely(!pos) )
>> +{
>> +printk_once(XENLOG_WARNING
>> +"%04x:%02x:%02x.%u MSI-X capability is missing\n",
>> +seg, bus, slot, func);
>> +return -EAGAIN;
>> +}
>
>Besides agreeing with Roger's comments, whose access do we
>intercept here at the time you observe the operation above
>producing a zero "pos"? If it's Dom0, then surely there's a bug
>in Dom0 doing the access in the first place when a reset hasn't
>completed yet?
>If it's a DomU, then is the reset happening
>behind _its_ back as well (which is not going to end well)?

Looks like it is Dom0. Xen should defend against Dom0 bugs, right?

Here is the call trace:
(XEN) memory_map:remove: dom1 gfn=f mfn=de000 nr=2000
(XEN) memory_map:remove: dom1 gfn=f4051 mfn=e0001 nr=3
(XEN) Assertion 'pos' failed at msi.c:1311
(XEN) ---[ Xen-4.13-unstable  x86_64  debug=y   Tainted:  C   ]---
(XEN) CPU:38
(XEN) RIP:e008:[] pci_msi_conf_write_intercept+0xd7/0x216
(XEN) RFLAGS: 00010246   CONTEXT: hypervisor (d0v1)
(XEN) rax:    rbx: 83087a446c50   rcx: 
(XEN) rdx: 830863c57fff   rsi: 0293   rdi: 82d080498ee0
(XEN) rbp: 830863c579e0   rsp: 830863c579b0   r8:  
(XEN) r9:  830863692ae0   r10:    r11: 
(XEN) r12: 00b2   r13: 830863c57a64   r14: 
(XEN) r15: 0089   cr0: 80050033   cr4: 003426e0
(XEN) cr3: 000812052000   cr2: 557d51fbc000
(XEN) fsb: 7f05f2caa400   gsb: 888194a4   gss: 
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen code around  
(pci_msi_conf_write_intercept+0xd7/0x216):
(XEN)  00 e8 d0 26 fd ff eb 85 <0f> 0b ba 05 00 00 00 be ff ff ff ff 48 89 df e8
(XEN) Xen stack trace from rsp=830863c579b0:
(XEN)000257be 8900  0002
(XEN)830863c57a64 00b2 830863c57a18 82d080297d99
(XEN)8308636bb000 830863c57a64 00b2 0002
(XEN)8900 830863c57a50 82d08037d40b 0cfe
(XEN)0002 0002 8308636bb000 830863c57a64
(XEN)830863c57a90 82d08037d5af 7fff8022854f 830863c57e30
(XEN)0002 0cfe 83086369c000 8308636bb000
(XEN)830863c57ad0 82d08037db65 7fff 0cfe
(XEN)830863c57e30 82d0803fb7c0  
(XEN)830863c57de8 82d0802bf35d 0004 
(XEN)82d080387800 00ef00ef 830863c57bc0 82d7
(XEN)82d0 00ef 8305a473ae70 
(XEN)830863c57b20 82cff000 0282 830863c57b60
(XEN)82d08023c27d 8305a473ae60 830863c57ba0 82d080248596
(XEN)00020040 8305a473ae60 0086 830863c57ba0
(XEN)82d08023c27d 0286 830863c57bb8 0040
(XEN)830863c57bc8 82d08026c747  
(XEN)   
(XEN)  830863c57da0 
(XEN)0003   80868086
(XEN) Xen call trace:
(XEN)[] pci_msi_conf_write_intercept+0xd7/0x216
(XEN)[] pci_conf_write_intercept+0x68/0x72
(XEN)[] emul-priv-op.c#pci_cfg_ok+0xb5/0x146
(XEN)[] emul-priv-op.c#guest_io_write+0x113/0x20b
(XEN)[] emul-priv-op.c#write_io+0xda/0xe4
(XEN)[] x86_emulate+0x11cf7/0x3169d
(XEN)[] x86_emulate_wrapper+0x26/0x5f
(XEN)[] pv_emulate_privileged_op+0x150/0x271
(XEN)[] do_general_protection+0x20b/0x257
(XEN)[] x86_64/entry.S#handle_exception_saved+0x68/0x94

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/pt: skip setup of posted format IRTE when gvec is 0

2019-05-05 Thread Chao Gao

On Thu, May 02, 2019 at 10:20:09AM +0200, Roger Pau Monné wrote:
>On Wed, May 01, 2019 at 12:41:13AM +0800, Chao Gao wrote:
>> On Tue, Apr 30, 2019 at 11:30:33AM +0200, Roger Pau Monné wrote:
>> >On Tue, Apr 30, 2019 at 05:01:21PM +0800, Chao Gao wrote:
>> >> On Tue, Apr 30, 2019 at 01:56:31AM -0600, Jan Beulich wrote:
>> >> >>>> On 30.04.19 at 07:19,  wrote:
>> >> >> When testing with an UP guest with a pass-thru device with vt-d pi
>> >> >> enabled in host, we observed that guest couldn't receive interrupts
>> >> >> from that pass-thru device. Dumping IRTE, we found the corresponding
>> >> >> IRTE is set to posted format with "vector" field as 0.
>> >> >> 
>> >> >> We would fall into this issue when guest used the pirq format of MSI
>> >> >> (see the comment xen_msi_compose_msg() in linux kernel). As 'dest_id'
>> >> >> is repurposed, skip migration which is based on 'dest_id'.
>> >> >
>> >> >I've gone through all uses of gvec, and I couldn't find any existing
>> >> >special casing of it being zero. I assume this is actually communication
>> >> >between the kernel and qemu,
>> >> 
>> >> Yes. 
>> >> 
>> >> >in which case I'd like to see an
>> >> >explanation of why the issue needs to be addressed in Xen rather
>> >> >than qemu.
>> >> 
>> >> To call pirq_guest_bind() to configure irq_desc properly.
>> >> Especially, we append a pointer of struct domain to 'action->guest' in
>> >> pirq_guest_bind(). Then __do_IRQ_guest() knows domains that are interested
>> >> in this interrupt and injects an interrupt to those domains.
>> >> 
>> >> >Otherwise, if I've overlooked something, would you
>> >> >mind pointing out where such special casing lives in Xen?
>> >> >
>> >> >In any event it doesn't look correct to skip migration altogether in
>> >> >that case. I'd rather expect it to require getting done differently.
>> >> >After all there still is a (CPU, vector) tuple associated with that
>> >> >{,p}IRQ if it's not posted, and hvm_migrate_pirq() is a no-op if it is
>> >> >posted.
>> >> 
>> >> Here, we try to set irq's target cpu to the cpu which the vmsi's target 
>> >> vcpu
>> >> is running on to reduce IPI. But the 'dest_id' field which used to
>> >> indicate the vmsi's target vcpu is missing, we don't know which cpu we 
>> >> should
>> >> migrate the irq to. One possible choice is the 'chn->notify_vcpu_id'
>> >> used in send_guest_pirq(). Do you think this choice is fine?
>> >
>> >I think that by the time the device model calls into pirq_guest_bind
>> >the PIRQ won't be bound to any event channel, so pirq->evtchn would be
>> >0.
>> 
>> Then skip pirq migration is the only choice here? And we can migrate
>> pirq when it is bound with an event channel.
>> 
>> >
>> >Note that the binding of the PIRQ with the event channel is done
>> >afterwards in xen_hvm_setup_msi_irqs by the Linux kernel.
>> >
>> >It seems like the device model should be using a different set of
>> >hypercalls to setup a PIRQ that is routed over an event channel, ie:
>> >PHYSDEVOP_map_pirq and friends.
>> 
>> Now qemu is using PHYSDEVOP_map_pirq. Right?
>
>Oh yes, QEMU already uses PHYSDEVOP_map_pirq to setup the interrupt.
>Then I'm not sure I see why QEMU calls XEN_DOMCTL_bind_pt_irq for
>interrupts that are routed over event channels. That hypercall is used

As I said above, it is to call pirq_guest_bind() to hook up to irq handler.

XEN_DOMCTL_bind_pt_pirq does two things:
#1. bind pirq with a guest interrupt
#2. register (domain,pirq) to the interrupt handler

currently, for pirq routed to evtchn, #1 is done by another hypercall,
evtchn_bind_pirq. and #2 is done in XEN_DOMCTL_bind_pt_irq.

>to bind a pirq to a native guest interrupt injection mechanism, which
>shouldn't be used if the interrupt is going to be delivered over an
>event channel.
>
>Can you see about avoiding the XEN_DOMCTL_bind_pt_irq call in QEMU if
>the interrupt is going to be routed over an event channel?

Yes. It is doable. But it needs changes in both qemu and Xen and some tricks
to be compatible with old qemu.

I prefer not to touch qemu and keep q

Re: [Xen-devel] [PATCH] x86/pt: skip setup of posted format IRTE when gvec is 0

2019-05-07 Thread Chao Gao

On Mon, May 06, 2019 at 03:39:40AM -0600, Jan Beulich wrote:
 On 06.05.19 at 06:44,  wrote:
>> On Thu, May 02, 2019 at 10:20:09AM +0200, Roger Pau Monné wrote:
>>>Can you see about avoiding the XEN_DOMCTL_bind_pt_irq call in QEMU if
>>>the interrupt is going to be routed over an event channel?
>> 
>> Yes. It is doable. But it needs changes in both qemu and Xen and some tricks
>> to be compatible with old qemu.
>
>That would be ugly indeed.
>
>> I prefer not to touch qemu and keep qemu unware of MSI's "routing over 
>> evtchn",
>> like the patch below:
>
>Is this meant as a replacement to your original patch, or as an
>add-on? In any event it's not immediately clear to me how

A replacement.

>...
>
>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -504,10 +504,7 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
>>  if ( !info )
>>  ERROR_EXIT(-ENOMEM);
>>  info->evtchn = port;
>> -rc = (!is_hvm_domain(d)
>> -  ? pirq_guest_bind(v, info,
>> -!!(bind->flags & BIND_PIRQ__WILL_SHARE))
>> -  : 0);
>> +rc = pirq_guest_bind(v, info, !!(bind->flags & BIND_PIRQ__WILL_SHARE));
>
>... this becoming unconditional won't conflict with its other
>invocation ...

Yes. It conflicts with the call in pt_irq_create_bind() for non-MSI case.

>
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -346,6 +346,12 @@ int pt_irq_create_bind(
>>  uint32_t gflags = pt_irq_bind->u.msi.gflags &
>>~XEN_DOMCTL_VMSI_X86_UNMASKED;
>>  
>> +if ( !pt_irq_bind->u.msi.gvec )
>> +{
>> +spin_unlock(&d->event_lock);
>> +return 0;
>> +}
>
>... further down in this function, for the non-MSI case.
>Similarly I wonder whether the respective unbind function
>invocations then won't go (or already are?) out of sync.

The "out of sync" issue seems hard to be solved. It is error-prone to
move pirq_guest_(un)bind from one hypercall to another.

On second thought, I plan to go back to my original patch. The only
issue for that patch is how to migrate irq properly to avoid IPI during
interrupt delivery.

Actually, current code has set irq affinity correctly:
1. pirq is bound to vcpu[0] in pt_irq_create_bind(). It also sets
corresponding physical irq's affinity to vcpu[0].
2. evtchn is bound to vcpu[0] in evtchn_bind_pirq(). During
delivery, we would send pirq to vcpu[0] and no IPI is required.
3. If evtchn is rebound to another vcpu in evtchn_bind_vcpu(), the
affinity of the physical irq is already reconfigured there.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 09/10] microcode: remove microcode_update_lock

2019-05-27 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index c819028..b64a58d 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -251,7 +248,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -263,15 +259,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index bfb48ce..94a1561 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -295,7 +292,6 @@ static struct microcode_patch *allow_microcode_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(µcode_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /*
  * Writeback and invalidate caches before updating microcode to avoid
@@ -327,7 +322,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(µcode_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 05/10] microcode: remove pointless 'cpu' parameter

2019-05-27 Thread Chao Gao

Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 12 ++--
 xen/arch/x86/microcode_amd.c| 35 +--
 xen/arch/x86/microcode_intel.c  | 25 +
 xen/arch/x86/smpboot.c  |  2 +-
 xen/include/asm-x86/microcode.h |  7 +++
 xen/include/asm-x86/processor.h |  2 +-
 7 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index aecc754..4f21903 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -253,7 +253,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu(0);
+microcode_resume_cpu();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 0c01dfa..16a6d50 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -196,19 +196,19 @@ struct microcode_info {
 char buffer[1];
 };
 
-int microcode_resume_cpu(unsigned int cpu)
+int microcode_resume_cpu(void)
 {
 int err;
-struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
+struct cpu_signature *sig = &this_cpu(cpu_sig);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode(cpu);
+err = microcode_ops->apply_microcode();
 spin_unlock(µcode_mutex);
 
 return err;
@@ -255,9 +255,9 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(cpu, buf, size);
+err = microcode_ops->cpu_request_microcode(buf, size);
 spin_unlock(µcode_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 93af2c9..0144df1 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -78,8 +78,9 @@ struct mpbhdr {
 static DEFINE_SPINLOCK(microcode_update_lock);
 
 /* See comment in start_update() for cases when this routine fails */
-static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
+static int collect_cpu_info(struct cpu_signature *csig)
 {
+unsigned int cpu = smp_processor_id();
 struct cpuinfo_x86 *c = &cpu_data[cpu];
 
 memset(csig, 0, sizeof(*csig));
@@ -152,18 +153,15 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 return 0;
 }
 
-static bool_t microcode_fits(const struct microcode_amd *mc_amd,
- unsigned int cpu)
+static bool microcode_fits(const struct microcode_amd *mc_amd)
 {
+unsigned int cpu = smp_processor_id();
 const struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
 const struct equiv_cpu_entry *equiv_cpu_table = mc_amd->equiv_cpu_table;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
 
-/* We should bind the task to the CPU */
-BUG_ON(cpu != raw_smp_processor_id());
-
 current_cpu_id = cpuid_eax(0x0001);
 
 if ( !find_equiv_cpu_id(equiv_cpu_table, current_cpu_id, &equiv_cpu_id) )
@@ -192,9 +190,7 @@ static bool_t microcode_fits(const struct microcode_amd 
*mc_amd,
 
 static bool match_cpu(const struct microcode_patch *patch)
 {
-if ( !patch )
-return false;
-return microcode_fits(patch->mc_amd, smp_processor_id());
+return patch ? microcode_fits(patch->mc_amd) : false;
 }
 
 static struct microcode_patch *alloc_microcode_patch(
@@ -253,18 +249,16 @@ static enum microcode_match_result compare_patch(
 return MIS_UCODE;
 }
 
-static int apply_microcode(unsigned int cpu)
+static int apply_microcode(void)
 {
 unsigned long flags;
 uint32_t rev;
 int hw_err;
+unsigned int cpu = smp_processor_id();
 struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 const struct microcode_header_amd *hdr;
 const struct microcode_patch *patch = microcode_get_cache();
 
-/* We should bind the task to the CPU */
-BUG_ON(raw_smp_processor_id() != cpu);
-
 if ( !match_cpu(patch) )
 return -EINVAL;
 
@@ -435,14 +429,14 @@ static const unsigned int final_levels[] = {
 0x01af
 };
 
-static bool_t check_final_patch_levels(unsigned int cpu)
+static bool check_final_patch_levels

[Xen-devel] [PATCH v7 03/10] microcode: introduce a global cache of ucode patch

2019-05-27 Thread Chao Gao

to replace the current per-cpu cache 'uci->mc'.

With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. And Having multiple microcode revisions
on different cpus would cause system unstable and is what should be
avoided. Hence, caching only one microcode update is good enough for all
cases.

Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And now
this global cache is loaded to cpu in apply_microcode().

All operations on the cache is expected to be done with the
'microcode_mutex' hold.

Note that I deliberately avoid touching 'uci->mc' as I am going to
remove it completely in the next patch.

Signed-off-by: Chao Gao 
---
Changes in v7:
 - reworked to cache only one microcode patch rather than a list of
 microcode patches.

Changes in v6:
 - constify local variables and function parameters if possible
 - comment that the global cache is protected by 'microcode_mutex'.
   and add assertions to catch violations in microcode_{save/find}_patch()

Changes in v5:
 - reword the commit description
 - find_patch() and save_patch() are abstracted into common functions
   with some hooks for AMD and Intel
---
 xen/arch/x86/microcode.c| 36 
 xen/arch/x86/microcode_amd.c| 91 +
 xen/arch/x86/microcode_intel.c  | 77 +++---
 xen/include/asm-x86/microcode.h | 15 +++
 4 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 4163f50..cff86a9 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -61,6 +61,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* Protected by microcode_mutex */
+static struct microcode_patch *microcode_cache;
+
 void __init microcode_set_module(unsigned int idx)
 {
 ucode_mod_idx = idx;
@@ -262,6 +265,39 @@ int microcode_resume_cpu(unsigned int cpu)
 return err;
 }
 
+const struct microcode_patch *microcode_get_cache(void)
+{
+ASSERT(spin_is_locked(µcode_mutex));
+
+return microcode_cache;
+}
+
+/* Return true if cache gets updated. Otherwise, return false */
+bool microcode_update_cache(struct microcode_patch *patch)
+{
+
+ASSERT(spin_is_locked(µcode_mutex));
+
+if ( !microcode_ops->match_cpu(patch) )
+return false;
+
+if ( !microcode_cache )
+microcode_cache = patch;
+else if ( microcode_ops->compare_patch(patch, microcode_cache) ==
+  NEW_UCODE )
+{
+microcode_ops->free_patch(microcode_cache);
+microcode_cache = patch;
+}
+else
+{
+microcode_ops->free_patch(patch);
+return false;
+}
+
+return true;
+}
+
 static int microcode_update_cpu(const void *buf, size_t size)
 {
 int err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 7a854c0..1f05899 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -190,24 +190,85 @@ static bool_t microcode_fits(const struct microcode_amd 
*mc_amd,
 return 1;
 }
 
+static bool match_cpu(const struct microcode_patch *patch)
+{
+if ( !patch )
+return false;
+return microcode_fits(patch->mc_amd, smp_processor_id());
+}
+
+static struct microcode_patch *alloc_microcode_patch(
+const struct microcode_amd *mc_amd)
+{
+struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
+struct microcode_amd *cache = xmalloc(struct microcode_amd);
+void *mpb = xmalloc_bytes(mc_amd->mpb_size);
+struct equiv_cpu_entry *equiv_cpu_table =
+xmalloc_bytes(mc_amd->equiv_cpu_table_size);
+
+if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
+{
+xfree(microcode_patch);
+xfree(cache);
+xfree(mpb);
+xfree(equiv_cpu_table);
+printk(XENLOG_ERR "microcode: Can not allocate memory\n");
+return ERR_PTR(-ENOMEM);
+}
+
+cache->equiv_cpu_table = equiv_cpu_table;
+cache->mpb = mpb;
+memcpy(cache->equiv_cpu_table, mc_amd->equiv_cpu_table,
+   mc_amd->equiv_cpu_table_size);
+memcpy(cache->mpb, mc_amd->mpb, mc_amd->mpb_size);
+cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
+cache->mpb_size = mc_amd->mpb_size;
+microcode_patch->mc_amd = cache;
+
+return microcode_patch;
+}
+
+static void free_patch(struct microcode_patch *microcode_patch)
+{
+struct microcode_amd *mc_amd = microcode_patch->mc_amd;
+
+xfree(m

[Xen-devel] [PATCH v7 10/10] x86/microcode: always collect_cpu_info() during boot

2019-05-27 Thread Chao Gao

From: Sergey Dyasli 

Currently cpu_sig struct is not updated during boot when either:

1. ucode_scan is set to false (e.g. no "ucode=scan" in cmdline)
2. initrd does not contain a microcode blob

These will result in cpu_sig.rev being 0 which affects APIC's
check_deadline_errata() and retpoline_safe() functions.

Fix this by getting ucode revision early during boot and SMP bring up.
While at it.

Signed-off-by: Sergey Dyasli 
Signed-off-by: Chao Gao 
---
changes in v7:
- rebase on patch 1~9
---
 xen/arch/x86/microcode.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index f4a417e..8aeb152 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -590,6 +590,10 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
+rc = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+if ( rc )
+return rc;
+
 if ( ucode_mod.mod_end || ucode_blob.size )
 rc = early_microcode_parse_and_update_cpu();
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode()

2019-05-27 Thread Chao Gao

During late microcode update, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). As a consequence,
apply_microcode() should be invoked explicitly in the common code.

Previously, apply_microcode() gets the microcode patch to be applied from
the microcode cache. Now, the patch is passed as a function argument and
a patch is cached for cpu-hotplug and cpu resuming, only after it has
been loaded to a cpu without any error. As a consequence, the
'match_cpu' check in microcode_update_cache is removed, which otherwise
would fail.

Assuming that all CPUs have the same signature, one patch matching with
current CPU should match with others. Then parsing microcode only needs
to be done once; cpu_request_microcode() is also moved out of
microcode_update_cpu().

On AMD side, svm_host_osvw_init() is supposed to be called after
microcode update. As apply_micrcode() won't be called by
cpu_request_microcode() now, svm_host_osvw_init() is moved to the
end of apply_microcode().

Signed-off-by: Chao Gao 
---
Changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
 - microcode_info which passes microcode blob to be parsed to each CPU is
 replaced by microcode_patch.

Changes in v6:
 - during early microcode update, BSP and APs call different functions.
   Thus AP can bypass parsing microcode blob.
---
 xen/arch/x86/acpi/power.c   |   2 +-
 xen/arch/x86/microcode.c| 209 ++--
 xen/arch/x86/microcode_amd.c|  41 
 xen/arch/x86/microcode_intel.c  |  69 ++---
 xen/arch/x86/smpboot.c  |   5 +-
 xen/include/asm-x86/microcode.h |   8 +-
 xen/include/asm-x86/processor.h |   3 +-
 7 files changed, 193 insertions(+), 144 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 4f21903..9583172 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -253,7 +253,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+early_microcode_update_cpu();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 16a6d50..23cf550 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,36 +189,62 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return the patch with the highest revision id among all matching
+ * patches in the blob. Return NULL if no suitable patch.
+ */
+static struct microcode_patch *microcode_parse_blob(const char *buf,
+uint32_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(&this_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
 
-int microcode_resume_cpu(void)
+return NULL;
+}
+
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(struct microcode_patch *patch)
 {
-int err;
-struct cpu_signature *sig = &this_cpu(cpu_sig);
+int ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
 
-if ( !microcode_ops )
-return 0;
+if ( unlikely(ret) )
+return ret;
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode();
-spin_unlock(µcode_mutex);
+if ( patch )
+{
+/*
+ * If a patch is specified, it should has newer revision than
+ * that of the patch cached.
+ */
+if ( microcode_cache &&
+ microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE 
)
+{
+spin_unlock(µcode_mutex);
+return -EINVAL;
+}
 
-return err;
-}
+ret = microcode_ops->apply_microcode(patch);
+}
+else if ( microcode_cache )
+{
+ret = microcode_ops->apply_microcode(microcode_cache);
+if ( ret == -EIO )
+printk("Update failed. Reboot needed\n");
+}
+else
+/* No patch to update */
+ret = -EINVAL;
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(µcode_mutex));
+spin_unlock(µcode_mutex);
 
-return microcode_cache;
+return ret;
 }
 
 /* Return true if cache gets updated. Otherwise, retu

[Xen-devel] [PATCH v7 04/10] microcode: remove struct ucode_cpu_info

2019-05-27 Thread Chao Gao

We can remove the per-cpu cache field in struct ucode_cpu_info since
it has been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.

Also remove 'microcode_resume_match' from microcode_ops because the
check is done in find_patch(). The cpu status notifier is also
removed. It was used to free the "mc" field to avoid memory leak.

Signed-off-by: Chao Gao 
---
Changes in v6:
 - remove the whole struct ucode_cpu_info instead of the per-cpu cache
 in it.
---
 xen/arch/x86/apic.c |   2 +-
 xen/arch/x86/microcode.c|  91 +++-
 xen/arch/x86/microcode_amd.c| 100 +---
 xen/arch/x86/microcode_intel.c  |  33 -
 xen/arch/x86/spec_ctrl.c|   2 +-
 xen/include/asm-x86/microcode.h |  13 +-
 6 files changed, 30 insertions(+), 211 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index fafc0bd..d216455 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1188,7 +1188,7 @@ static void __init check_deadline_errata(void)
 else
 rev = (unsigned long)m->driver_data;
 
-if ( this_cpu(ucode_cpu_info).cpu_sig.rev >= rev )
+if ( this_cpu(cpu_sig).rev >= rev )
 return;
 
 setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE);
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index cff86a9..0c01dfa 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -187,7 +187,7 @@ const struct microcode_ops *microcode_ops;
 
 static DEFINE_SPINLOCK(microcode_mutex);
 
-DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
+DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 struct microcode_info {
 unsigned int cpu;
@@ -196,70 +196,19 @@ struct microcode_info {
 char buffer[1];
 };
 
-static void __microcode_fini_cpu(unsigned int cpu)
-{
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
-
-xfree(uci->mc.mc_valid);
-memset(uci, 0, sizeof(*uci));
-}
-
-static void microcode_fini_cpu(unsigned int cpu)
-{
-spin_lock(µcode_mutex);
-__microcode_fini_cpu(cpu);
-spin_unlock(µcode_mutex);
-}
-
 int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
-struct cpu_signature nsig;
-unsigned int cpu2;
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
-if ( err )
-{
-__microcode_fini_cpu(cpu);
-spin_unlock(µcode_mutex);
-return err;
-}
-
-if ( uci->mc.mc_valid )
-{
-err = microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid);
-if ( err >= 0 )
-{
-if ( err )
-err = microcode_ops->apply_microcode(cpu);
-spin_unlock(µcode_mutex);
-return err;
-}
-}
-
-nsig = uci->cpu_sig;
-__microcode_fini_cpu(cpu);
-uci->cpu_sig = nsig;
-
-err = -EIO;
-for_each_online_cpu ( cpu2 )
-{
-uci = &per_cpu(ucode_cpu_info, cpu2);
-if ( uci->mc.mc_valid &&
- microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid) > 0 )
-{
-err = microcode_ops->apply_microcode(cpu);
-break;
-}
-}
-
-__microcode_fini_cpu(cpu);
+err = microcode_ops->collect_cpu_info(cpu, sig);
+if ( likely(!err) )
+err = microcode_ops->apply_microcode(cpu);
 spin_unlock(µcode_mutex);
 
 return err;
@@ -302,16 +251,13 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 {
 int err;
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
 
 spin_lock(µcode_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->cpu_request_microcode(cpu, buf, size);
-else
-__microcode_fini_cpu(cpu);
-
 spin_unlock(µcode_mutex);
 
 return err;
@@ -398,25 +344,6 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-static int microcode_percpu_callback(
-struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-unsigned int cpu = (unsigned long)hcpu;
-
-switch ( action )
-{
-case CPU_DEAD:
-microcode_fini_cpu(cpu);
-break;
-}
-
-return NOTIFY_DONE;
-}
-
-static struct notifier_block microcode_percpu_nfb = {
-.notifier_call = microcode_percpu_callback,
-};
-
 int __init early_microcode_update_cpu(bool start_

[Xen-devel] [PATCH v7 07/10] microcode/intel: Writeback and invalidate caches before updating microcode

2019-05-27 Thread Chao Gao

Updating microcode is less error prone when caches have been flushed and
depending on what exactly the microcode is updating. For example, some
of the issues around certain Broadwell parts can be addressed by doing a
full cache flush.

With parallel microcode update, the cost of this patch is hardly
noticable. Although only BDX with an old microcode needs this fix, we
would like to avoid future issues in case they come by later due to
other reasons.

[linux commit: 91df9fdf51492aec9fed6b4cbd33160886740f47]
Signed-off-by: Chao Gao 
Cc: Ashok Raj 
---
Changes in v7:
 - explain why we do 'wbinvd' unconditionally rather than only for BDX
 in commit message

Changes in v6:
 - new
---
 xen/arch/x86/microcode_intel.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 650495d..bfb48ce 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -310,6 +310,12 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 /* serialize access to the physical write to MSR 0x79 */
 spin_lock_irqsave(µcode_update_lock, flags);
 
+/*
+ * Writeback and invalidate caches before updating microcode to avoid
+ * internal issues depending on what the microcode is updating.
+ */
+wbinvd();
+
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
 wrmsrl(MSR_IA32_UCODE_REV, 0x0ULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 02/10] microcode/intel: extend microcode_update_match()

2019-05-27 Thread Chao Gao

to a more generic function. Then, this function can compare two given
microcodes' signature/revision as well. Comparing two microcodes is
used to update the global microcode cache (introduced by the later
patches in this series) when a new microcode is given.

Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v6:
 - eliminate unnecessary type casting in microcode_update_match
 - check if a patch has an extend header

Changes in v5:
 - constify the extended_signature
 - use named enum type for the return value of microcode_update_match
---
 xen/arch/x86/microcode_intel.c  | 48 +++--
 xen/include/asm-x86/microcode.h |  6 ++
 2 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 22fdeca..ecec83b 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -134,14 +134,28 @@ static int collect_cpu_info(unsigned int cpu_num, struct 
cpu_signature *csig)
 return 0;
 }
 
-static inline int microcode_update_match(
-unsigned int cpu_num, const struct microcode_header_intel *mc_header,
-int sig, int pf)
+static enum microcode_match_result microcode_update_match(
+const struct microcode_header_intel *mc_header, unsigned int sig,
+unsigned int pf, unsigned int rev)
 {
-struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
+const struct extended_sigtable *ext_header;
+const struct extended_signature *ext_sig;
+unsigned long data_size = get_datasize(mc_header);
+unsigned int i;
+
+if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
 
-return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
-(mc_header->rev > uci->cpu_sig.rev));
+if ( get_totalsize(mc_header) == (data_size + MC_HEADER_SIZE) )
+return MIS_UCODE;
+
+ext_header = (const void *)(mc_header + 1) + data_size;
+ext_sig = (const void *)(ext_header + 1);
+for ( i = 0; i < ext_header->count; i++ )
+if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+return MIS_UCODE;
 }
 
 static int microcode_sanity_check(void *mc)
@@ -243,31 +257,13 @@ static int get_matching_microcode(const void *mc, 
unsigned int cpu)
 {
 struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
 const struct microcode_header_intel *mc_header = mc;
-const struct extended_sigtable *ext_header;
 unsigned long total_size = get_totalsize(mc_header);
-int ext_sigcount, i;
-struct extended_signature *ext_sig;
 void *new_mc;
 
-if ( microcode_update_match(cpu, mc_header,
-mc_header->sig, mc_header->pf) )
-goto find;
-
-if ( total_size <= (get_datasize(mc_header) + MC_HEADER_SIZE) )
+if ( microcode_update_match(mc, uci->cpu_sig.sig, uci->cpu_sig.pf,
+uci->cpu_sig.rev) != NEW_UCODE )
 return 0;
 
-ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-ext_sigcount = ext_header->count;
-ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-for ( i = 0; i < ext_sigcount; i++ )
-{
-if ( microcode_update_match(cpu, mc_header,
-ext_sig->sig, ext_sig->pf) )
-goto find;
-ext_sig++;
-}
-return 0;
- find:
 pr_debug("microcode: CPU%d found a matching microcode update with"
  " version %#x (current=%#x)\n",
  cpu, mc_header->rev, uci->cpu_sig.rev);
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 23ea954..73ebe9a 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -3,6 +3,12 @@
 
 #include 
 
+enum microcode_match_result {
+OLD_UCODE, /* signature matched, but revision id isn't newer */
+NEW_UCODE, /* signature matched, but revision id is newer */
+MIS_UCODE, /* signature mismatched */
+};
+
 struct cpu_signature;
 struct ucode_cpu_info;
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 01/10] misc/xen-ucode: Upload a microcode blob to the hypervisor

2019-05-27 Thread Chao Gao

This patch provides a tool for late microcode update.

Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Chao Gao 
---
Changes in v7:
 - introduce xc_microcode_update() rather than xc_platform_op()
 - avoid creating bounce buffer twice
 - rename xenmicrocode to xen-ucode, following naming tradition
 of other tools there.

---
 tools/libxc/include/xenctrl.h |  1 +
 tools/libxc/xc_misc.c | 23 +
 tools/misc/Makefile   |  4 +++
 tools/misc/xen-ucode.c| 78 +++
 4 files changed, 106 insertions(+)
 create mode 100644 tools/misc/xen-ucode.c

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 538007a..6d80ae5 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1244,6 +1244,7 @@ typedef uint32_t xc_node_to_node_dist_t;
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo);
+int xc_microcode_update(xc_interface *xch, const void *buf, size_t len);
 int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
 xc_meminfo_t *meminfo, uint32_t *distance);
 int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 5e6714a..85538e0 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -226,6 +226,29 @@ int xc_physinfo(xc_interface *xch,
 return 0;
 }
 
+int xc_microcode_update(xc_interface *xch, const void *buf, size_t len)
+{
+int ret;
+DECLARE_PLATFORM_OP;
+DECLARE_HYPERCALL_BUFFER(struct xenpf_microcode_update, uc);
+
+uc = xc_hypercall_buffer_alloc(xch, uc, len);
+if (uc == NULL)
+return -1;
+
+memcpy(uc, buf, len);
+
+platform_op.cmd = XENPF_microcode_update;
+platform_op.u.microcode.length = len;
+set_xen_guest_handle(platform_op.u.microcode.data, uc);
+
+ret = do_platform_op(xch, &platform_op);
+
+xc_hypercall_buffer_free(xch, uc);
+
+return ret;
+}
+
 int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo)
 {
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index d4320dc..63947bf 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -22,6 +22,7 @@ INSTALL_SBIN-$(CONFIG_X86) += xen-hvmcrash
 INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx
 INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd
 INSTALL_SBIN-$(CONFIG_X86) += xen-mfndump
+INSTALL_SBIN-$(CONFIG_X86) += xen-ucode
 INSTALL_SBIN   += xencov
 INSTALL_SBIN   += xenlockprof
 INSTALL_SBIN   += xenperf
@@ -113,4 +114,7 @@ xen-lowmemd: xen-lowmemd.o
 xencov: xencov.o
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-ucode: xen-ucode.o
+   $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 -include $(DEPS_INCLUDE)
diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
new file mode 100644
index 000..da668ca
--- /dev/null
+++ b/tools/misc/xen-ucode.c
@@ -0,0 +1,78 @@
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void show_help(void)
+{
+fprintf(stderr,
+"xenmicrocode: Xen microcode updating tool\n"
+"Usage: xenmicrocode \n");
+}
+
+int main(int argc, char *argv[])
+{
+int fd, len, ret;
+char *filename, *buf;
+struct stat st;
+xc_interface *xch;
+
+if (argc < 2)
+{
+show_help();
+return 0;
+}
+
+filename = argv[1];
+fd = open(filename, O_RDONLY);
+if (fd < 0) {
+fprintf(stderr, "Could not open %s. (err: %s)\n",
+filename, strerror(errno));
+return errno;
+}
+
+if (stat(filename, &st) != 0) {
+fprintf(stderr, "Could not get the size of %s. (err: %s)\n",
+filename, strerror(errno));
+return errno;
+}
+
+len = st.st_size;
+buf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+if (buf == MAP_FAILED) {
+fprintf(stderr, "mmap failed. (error: %s)\n", strerror(errno));
+return errno;
+}
+
+xch = xc_interface_open(0,0,0);
+if (xch == NULL)
+{
+fprintf(stderr, "Error opening xc interface. (err: %s)\n",
+strerror(errno));
+return errno;
+}
+
+ret = xc_microcode_update(xch, buf, len);
+if (ret)
+fprintf(stderr, "Failed to update microcode. (err: %s)\n",
+strerror(errno));
+
+xc_interface_close(xch);
+
+if (munmap(buf, len)) {
+printf("Could not unmap: %d(%s)\n", errno, strerror(errno));
+return errno;
+}
+close(fd);
+
+return 0;
+}
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading

2019-05-27 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.

Changes in v6:
 - Use one timeout period for rendezvous stage and another for update stage.
 - scale time to wait by the number of remaining cpus to respond.
   It helps to find something wrong earlier and thus we can reboot the
   system earlier.
---
 xen/arch/x86/microcode.c | 171 ++-
 1 file changed, 155 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 23cf550..f4a417e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -22,6 +22,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -30,15 +31,34 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference (for
+ * instance, sometimes wbinvd takes relative long time). And a perfect
+ * timeout doesn't help a lot except an early shutdown.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
@@ -190,6 +210,12 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ */
+static atomic_t cpu_in, cpu_out, cpu_updated;
+
+/*
  * Return the patch with the highest revision id among all matching
  * patches in the blob. Return NULL if no suitable patch.
  */
@@ -270,31 +296,90 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static long do_microcode_update(void *patch)
+/* Wait for CPUs to rendezvous with a timeout (us) */
+static int wait_for_cpus(atomic_t *cnt, unsigned int expect,
+ unsigned int timeout)
 {
-int error, cpu;
-
-error = microcode_update_cpu(patch);
-if ( error )
+while ( atomic_read(cnt) < expect )
 {
-microcode_ops->free_patch(microcode_cache);
-return error;
+if ( !timeout )
+{
+printk("CPU%d: Timeout when waiting for CPUs calling in\n",
+   smp_processor_id());
+return -EBUSY;
+}
+udelay(1);
+timeout--;
 }
 
+return 0;
+}
 
-cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
-if ( cpu < nr_cpu_ids )
-return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
+static int do_microcode_update(void *patch)
+{
+unsigned int cpu = smp_processor_id();
+unsigned int cpu_nr = num_online_cpus();
+unsigned int finished;
+int ret;
+static bool error;
 
-microcode_update_cache(patch);
+atomic_inc(&cpu_in);
+ret = wait_for_cpus(&cpu_in, cpu_nr, MICROCODE_CALLIN_TIMEOUT_US);
+if ( ret )
+return ret;
 
-return error;
+ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+/*
+ * Load microcode update on only one logical processor per core.
+ * Here, among logical processors of a core, the one with the
+ * lowest thread id is chosen to perform the loading.
+ */
+if ( !ret && (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
+{
+ret = microcode_ops->apply_microcode(patch);
+if ( !ret )
+atomic_inc(&cpu_updated);
+}
+/*
+ * Increase the wait timeout to a safe value here since we're serializing
+ * the microcode update and that could take a whi

[Xen-devel] [PATCH v7 00/10] improve late microcode loading

2019-05-27 Thread Chao Gao

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 7 and 8 for more details).

This series includes five changes:
 1. Patch 1: an userspace tool for late microcode update
 2. Patch 2-6: introduce a global microcode cache and some cleanup
 3. Patch 7: writeback and invalidate cache before updating microcode
 3. Patch 8: synchronize late microcode loading
 4. Patch 9: support parallel microcodes update on different cores
 5. Patch 10: always read microcode revision at boot time

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware. Sergey, could you help to test this series on an
AMD machine again?
At least, two basic tests are needed:
* do a microcode update after system bootup
* don't bring all pCPUs up at bootup by specifying maxcpus option in xen
  command line and then do a microcode update and online all offlined
  CPUs via 'xen-hptool'.

Chao Gao (9):
  misc/xen-ucode: Upload a microcode blob to the hypervisor
  microcode/intel: extend microcode_update_match()
  microcode: introduce a global cache of ucode patch
  microcode: remove struct ucode_cpu_info
  microcode: remove pointless 'cpu' parameter
  microcode: split out apply_microcode() from cpu_request_microcode()
  microcode/intel: Writeback and invalidate caches before updating
microcode
  x86/microcode: Synchronize late microcode loading
  microcode: remove microcode_update_lock

Sergey Dyasli (1):
  x86/microcode: always collect_cpu_info() during boot

 tools/libxc/include/xenctrl.h   |   1 +
 tools/libxc/xc_misc.c   |  23 +++
 tools/misc/Makefile |   4 +
 tools/misc/xen-ucode.c  |  78 
 xen/arch/x86/acpi/power.c   |   2 +-
 xen/arch/x86/apic.c |   2 +-
 xen/arch/x86/microcode.c| 401 
 xen/arch/x86/microcode_amd.c| 245 
 xen/arch/x86/microcode_intel.c  | 202 ++--
 xen/arch/x86/smpboot.c  |   5 +-
 xen/arch/x86/spec_ctrl.c|   2 +-
 xen/include/asm-x86/microcode.h |  39 ++--
 xen/include/asm-x86/processor.h |   3 +-
 13 files changed, 639 insertions(+), 368 deletions(-)
 create mode 100644 tools/misc/xen-ucode.c

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 01/10] misc/xen-ucode: Upload a microcode blob to the hypervisor

2019-06-05 Thread Chao Gao

On Tue, Jun 04, 2019 at 05:14:14PM +0100, Andrew Cooper wrote:
>On 27/05/2019 09:31, Chao Gao wrote:
>> This patch provides a tool for late microcode update.
>>
>> Signed-off-by: Konrad Rzeszutek Wilk 
>> Signed-off-by: Chao Gao 
>> ---
>> Changes in v7:
>>  - introduce xc_microcode_update() rather than xc_platform_op()
>>  - avoid creating bounce buffer twice
>>  - rename xenmicrocode to xen-ucode, following naming tradition
>>  of other tools there.
>>
>> ---
>>  tools/libxc/include/xenctrl.h |  1 +
>>  tools/libxc/xc_misc.c | 23 +
>>  tools/misc/Makefile   |  4 +++
>>  tools/misc/xen-ucode.c| 78 
>> +++
>>  4 files changed, 106 insertions(+)
>>  create mode 100644 tools/misc/xen-ucode.c
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 538007a..6d80ae5 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -1244,6 +1244,7 @@ typedef uint32_t xc_node_to_node_dist_t;
>>  int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
>>  int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
>> xc_cputopo_t *cputopo);
>> +int xc_microcode_update(xc_interface *xch, const void *buf, size_t len);
>>  int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
>>  xc_meminfo_t *meminfo, uint32_t *distance);
>>  int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
>> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
>> index 5e6714a..85538e0 100644
>> --- a/tools/libxc/xc_misc.c
>> +++ b/tools/libxc/xc_misc.c
>> @@ -226,6 +226,29 @@ int xc_physinfo(xc_interface *xch,
>>  return 0;
>>  }
>>  
>> +int xc_microcode_update(xc_interface *xch, const void *buf, size_t len)
>> +{
>> +int ret;
>> +DECLARE_PLATFORM_OP;
>> +DECLARE_HYPERCALL_BUFFER(struct xenpf_microcode_update, uc);
>> +
>> +uc = xc_hypercall_buffer_alloc(xch, uc, len);
>> +if (uc == NULL)
>
>Xen style.  Extra space please.
>
>> +return -1;
>> +
>> +memcpy(uc, buf, len);
>> +
>> +platform_op.cmd = XENPF_microcode_update;
>> +platform_op.u.microcode.length = len;
>> +set_xen_guest_handle(platform_op.u.microcode.data, uc);
>> +
>> +ret = do_platform_op(xch, &platform_op);
>> +
>> +xc_hypercall_buffer_free(xch, uc);
>> +
>> +return ret;
>> +}
>> +
>>  int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
>> xc_cputopo_t *cputopo)
>>  {
>> diff --git a/tools/misc/Makefile b/tools/misc/Makefile
>> index d4320dc..63947bf 100644
>> --- a/tools/misc/Makefile
>> +++ b/tools/misc/Makefile
>> @@ -22,6 +22,7 @@ INSTALL_SBIN-$(CONFIG_X86) += xen-hvmcrash
>>  INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx
>>  INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd
>>  INSTALL_SBIN-$(CONFIG_X86) += xen-mfndump
>> +INSTALL_SBIN-$(CONFIG_X86) += xen-ucode
>>  INSTALL_SBIN   += xencov
>>  INSTALL_SBIN   += xenlockprof
>>  INSTALL_SBIN   += xenperf
>> @@ -113,4 +114,7 @@ xen-lowmemd: xen-lowmemd.o
>>  xencov: xencov.o
>>  $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
>>  
>> +xen-ucode: xen-ucode.o
>> +$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
>> +
>>  -include $(DEPS_INCLUDE)
>> diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
>> new file mode 100644
>> index 000..da668ca
>> --- /dev/null
>> +++ b/tools/misc/xen-ucode.c
>> @@ -0,0 +1,78 @@
>> +#define _GNU_SOURCE
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +void show_help(void)
>> +{
>> +fprintf(stderr,
>> +"xenmicrocode: Xen microcode updating tool\n"
>> +"Usage: xenmicrocode \n");
>
>s/xenmicrocode/xen-ucode/
>
>Both can be fixed on commit
>
>Acked-by: Andrew Cooper 

Thanks.

As Jan said, it is better to use argv[0] here.

Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 02/10] microcode/intel: extend microcode_update_match()

2019-06-06 Thread Chao Gao

On Tue, Jun 04, 2019 at 08:39:15AM -0600, Jan Beulich wrote:
 On 27.05.19 at 10:31,  wrote:
>> --- a/xen/arch/x86/microcode_intel.c
>> +++ b/xen/arch/x86/microcode_intel.c
>> @@ -134,14 +134,28 @@ static int collect_cpu_info(unsigned int cpu_num, 
>> struct cpu_signature *csig)
>>  return 0;
>>  }
>>  
>> -static inline int microcode_update_match(
>> -unsigned int cpu_num, const struct microcode_header_intel *mc_header,
>> -int sig, int pf)
>> +static enum microcode_match_result microcode_update_match(
>> +const struct microcode_header_intel *mc_header, unsigned int sig,
>> +unsigned int pf, unsigned int rev)
>>  {
>> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
>> +const struct extended_sigtable *ext_header;
>> +const struct extended_signature *ext_sig;
>> +unsigned long data_size = get_datasize(mc_header);
>> +unsigned int i;
>> +
>> +if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
>> +return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
>
>As indicated before, I think you would better also provide an "equal"
>indication. Iirc I've told you that I have one system where the cores
>get handed over from the BIOS in an inconsistent state (only core
>has ucode loaded). Hence we'd want to be able to also _store_
>ucode matching that found on CPU 0, without actually want to _load_
>it there.

Will do. What if no microcode update is provided in this case? Shall
we refuse to boot? If we allow different microcode revisions in the
system, it would complicate late microcode loading.

>
>> -return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
>> -(mc_header->rev > uci->cpu_sig.rev));
>> +if ( get_totalsize(mc_header) == (data_size + MC_HEADER_SIZE) )
>> +return MIS_UCODE;
>
>Okay, you're tightening the original <= to == here. But if you're
>already tightening things, why don't you make sure you actually
>have enough data to ...
>
>> +ext_header = (const void *)(mc_header + 1) + data_size;
>
>... hold an extended header, and then also to hold ...
>
>> +ext_sig = (const void *)(ext_header + 1);
>> +for ( i = 0; i < ext_header->count; i++ )
>> +if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
>> +return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
>
>... enough array elements?

Do you think below incremental change is fine?

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 94a1561..3dcbd28 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -138,18 +138,25 @@ static enum microcode_match_result microcode_update_match(
 const struct extended_signature *ext_sig;
 unsigned long data_size = get_datasize(mc_header);
 unsigned int i;
+const void *end = (const void *)mc_header + get_totalsize(mc_header);
 
 if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
 return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
 
-if ( get_totalsize(mc_header) == (data_size + MC_HEADER_SIZE) )
-return MIS_UCODE;
-
 ext_header = (const void *)(mc_header + 1) + data_size;
 ext_sig = (const void *)(ext_header + 1);
-for ( i = 0; i < ext_header->count; i++ )
-if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
-return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+/*
+ * Make sure there is enough space to hold an extended header and enough
+ * array elements.
+ */
+if ( (end >= (const void *)ext_sig) &&
+ (end >= (const void *)(ext_sig + ext_header->count)) )
+{
+for ( i = 0; i < ext_header->count; i++ )
+if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+}
 
 return MIS_UCODE;
 }

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 03/10] microcode: introduce a global cache of ucode patch

2019-06-09 Thread Chao Gao

On Tue, Jun 04, 2019 at 09:03:20AM -0600, Jan Beulich wrote:
 On 27.05.19 at 10:31,  wrote:
>> +bool microcode_update_cache(struct microcode_patch *patch)
>> +{
>> +
>> +ASSERT(spin_is_locked(µcode_mutex));
>> +
>> +if ( !microcode_ops->match_cpu(patch) )
>> +return false;
>> +
>> +if ( !microcode_cache )
>> +microcode_cache = patch;
>> +else if ( microcode_ops->compare_patch(patch, microcode_cache) ==
>> +  NEW_UCODE )
>> +{
>> +microcode_ops->free_patch(microcode_cache);
>> +microcode_cache = patch;
>> +}
>
>Hmm, okay, the way you do things here three enumeration values
>may indeed be sufficient. "old" may just be a little misleading then.
>(As to my respective comment on the previous patch.)
>
>> +static struct microcode_patch *alloc_microcode_patch(
>> +const struct microcode_amd *mc_amd)
>> +{
>> +struct microcode_patch *microcode_patch = xmalloc(struct 
>> microcode_patch);
>> +struct microcode_amd *cache = xmalloc(struct microcode_amd);
>> +void *mpb = xmalloc_bytes(mc_amd->mpb_size);
>> +struct equiv_cpu_entry *equiv_cpu_table =
>> +xmalloc_bytes(mc_amd->equiv_cpu_table_size);
>> +
>> +if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
>> +{
>> +xfree(microcode_patch);
>> +xfree(cache);
>> +xfree(mpb);
>> +xfree(equiv_cpu_table);
>> +printk(XENLOG_ERR "microcode: Can not allocate memory\n");
>
>I'm not convinced this needs logging.
>
>> +return ERR_PTR(-ENOMEM);
>> +}
>> +
>> +cache->equiv_cpu_table = equiv_cpu_table;
>> +cache->mpb = mpb;
>> +memcpy(cache->equiv_cpu_table, mc_amd->equiv_cpu_table,
>
>Why not use the local variable here and ...
>
>> +   mc_amd->equiv_cpu_table_size);
>> +memcpy(cache->mpb, mc_amd->mpb, mc_amd->mpb_size);
>
>here? Less source code and presumably also slightly less binary
>code. In fact I wonder if you wouldn't better memcpy() first
>anyway, and only then store the values into the fields. It won't
>matter much with the global lock held, but it's generally good
>practice to do things in an order that won't risk to confuse
>hypothetical consumers of the data.

Will do.

>
>> +static void free_patch(struct microcode_patch *microcode_patch)
>> +{
>> +struct microcode_amd *mc_amd = microcode_patch->mc_amd;
>> +
>> +xfree(mc_amd->equiv_cpu_table);
>> +xfree(mc_amd->mpb);
>> +xfree(mc_amd);
>> +xfree(microcode_patch);
>
>I think I said so before: Freeing of the generic wrapper struct
>would probably better be placed in generic code.

Do you mean something as shown below:

/* in generic code */

struct microcode_patch {
union {
struct microcode_intel *mc_intel;
struct microcode_amd *mc_amd;
void *mc;
};
};

void microcode_free_patch(struct microcode_patch *microcode_patch)
{
microcode_ops->free_patch(microcode_patch->mc);
xfree(microcode_patch);
}

/* in vendor-specific (AMD) code */

static void free_patch(void *mc)
{
struct microcode_amd *mc_amd = mc;

xfree(mc_amd->equiv_cpu_table);
xfree(mc_amd->mpb);
xfree(mc_amd);
}

>
>> @@ -497,7 +558,20 @@ static int cpu_request_microcode(unsigned int cpu, 
>> const void *buf,
>>  while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
>> &offset)) == 0 )
>>  {
>> -if ( microcode_fits(mc_amd, cpu) )
>> +struct microcode_patch *new_patch = alloc_microcode_patch(mc_amd);
>> +
>> +if ( IS_ERR(new_patch) )
>> +{
>> +error = PTR_ERR(new_patch);
>> +break;
>> +}
>> +
>> +if ( match_cpu(new_patch) )
>> +microcode_update_cache(new_patch);
>> +else
>> +free_patch(new_patch);
>
>Why do you re-do what microcode_update_cache() already does?
>It calls ->match_cpu() and ->free_patch() all by itself. It looks as
>if it would need to gain one more ->free_patch() invocation though.
>

Will remove both invocations of match_cpu().

To support the case (the broken bios) you described, a patch which
needs to be stored isn't necessary to be newer than the microcode loaded
to current CPU. As long as the processor's signature is covered by the
patch, we will store the patch regardless the revision number.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 04/10] microcode: remove struct ucode_cpu_info

2019-06-10 Thread Chao Gao

On Tue, Jun 04, 2019 at 09:13:46AM -0600, Jan Beulich wrote:
 On 27.05.19 at 10:31,  wrote:
>> We can remove the per-cpu cache field in struct ucode_cpu_info since
>> it has been replaced by a global cache. It would leads to only one field
>> remaining in ucode_cpu_info. Then, this struct is removed and the
>> remaining field (cpu signature) is stored in per-cpu area.
>> 
>> Also remove 'microcode_resume_match' from microcode_ops because the
>> check is done in find_patch(). The cpu status notifier is also
>> removed. It was used to free the "mc" field to avoid memory leak.
>
>There's no find_patch() function anymore afaics. And I also think this
>should be a separate patch. The above isn't enough imo to justify ...
>
>>  int microcode_resume_cpu(unsigned int cpu)
>>  {
>>  int err;
>> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>> -struct cpu_signature nsig;
>> -unsigned int cpu2;
>> +struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
>>  
>>  if ( !microcode_ops )
>>  return 0;
>>  
>>  spin_lock(µcode_mutex);
>>  
>> -err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
>> -if ( err )
>> -{
>> -__microcode_fini_cpu(cpu);
>> -spin_unlock(µcode_mutex);
>> -return err;
>> -}
>> -
>> -if ( uci->mc.mc_valid )
>> -{
>> -err = microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid);
>> -if ( err >= 0 )
>> -{
>> -if ( err )
>> -err = microcode_ops->apply_microcode(cpu);
>> -spin_unlock(µcode_mutex);
>> -return err;
>> -}
>> -}
>> -
>> -nsig = uci->cpu_sig;
>> -__microcode_fini_cpu(cpu);
>> -uci->cpu_sig = nsig;
>> -
>> -err = -EIO;
>> -for_each_online_cpu ( cpu2 )
>> -{
>> -uci = &per_cpu(ucode_cpu_info, cpu2);
>> -if ( uci->mc.mc_valid &&
>> - microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid) > 
>> 0 )
>> -{
>> -err = microcode_ops->apply_microcode(cpu);
>> -break;
>> -}
>> -}
>
>... in particular the removal of this loop, the more that both the
>loop and the code ahead of it also call ->apply_microcode().

Ok. Will split it out from this patch and refine the patch description.

Basically, this function tries best to find a suitable patch from the
per-cpu cache and loads it. Currently, the per-cpu cache is replaced by
the global cache, and ->apply_microcode() loads the global cache rather
then the per-cpu cache. Hence, a simple invocation of ->apply_microcode()
is enough to apply the global cache during CPU hotplug or resuming from
hibernation.

>
>> @@ -281,7 +281,6 @@ static enum microcode_match_result compare_patch(
>>   */
>>  static int get_matching_microcode(const void *mc, unsigned int cpu)
>>  {
>> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>
>Note how this was using "cpu".
>
>> @@ -308,17 +307,7 @@ static int get_matching_microcode(const void *mc, 
>> unsigned int cpu)
>>  
>>  pr_debug("microcode: CPU%d found a matching microcode update with"
>>   " version %#x (current=%#x)\n",
>> - cpu, mc_header->rev, uci->cpu_sig.rev);
>> -new_mc = xmalloc_bytes(total_size);
>> -if ( new_mc == NULL )
>> -{
>> -printk(KERN_ERR "microcode: error! Can not allocate memory\n");
>> -return -ENOMEM;
>> -}
>> -
>> -memcpy(new_mc, mc, total_size);
>> -xfree(uci->mc.mc_intel);
>> -uci->mc.mc_intel = new_mc;
>> + cpu, mc_header->rev, this_cpu(cpu_sig).rev);
>
>Why "this_cpu()" here?

It should be a part of next patch.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 05/10] microcode: remove pointless 'cpu' parameter

2019-06-10 Thread Chao Gao

On Tue, Jun 04, 2019 at 09:29:34AM -0600, Jan Beulich wrote:
 On 27.05.19 at 10:31,  wrote:
>> --- a/xen/arch/x86/microcode_amd.c
>> +++ b/xen/arch/x86/microcode_amd.c
>> @@ -78,8 +78,9 @@ struct mpbhdr {
>>  static DEFINE_SPINLOCK(microcode_update_lock);
>>  
>>  /* See comment in start_update() for cases when this routine fails */
>> -static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
>> +static int collect_cpu_info(struct cpu_signature *csig)
>>  {
>> +unsigned int cpu = smp_processor_id();
>>  struct cpuinfo_x86 *c = &cpu_data[cpu];
>
>I think it would be more clear if you used current_cpu_data here.
>The only other use of "cpu" is in a pr_debug(), which by default
>expands to nothing anyway, and hence is cheap to change to
>use smp_processor_id() instead.

Will do.

>
>> @@ -435,14 +429,14 @@ static const unsigned int final_levels[] = {
>>  0x01af
>>  };
>>  
>> -static bool_t check_final_patch_levels(unsigned int cpu)
>> +static bool check_final_patch_levels(void)
>>  {
>>  /*
>>   * Check the current patch levels on the cpu. If they are equal to
>>   * any of the 'final_levels', then we should not update the microcode
>>   * patch on the cpu as system will hang otherwise.
>>   */
>> -const struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
>> +const struct cpu_signature *sig = &this_cpu(cpu_sig);
>>  unsigned int i;
>
>I don't see any dependency of this function upon running on
>the subject CPU.

Ok. I will drop this change.

>
>> @@ -279,12 +278,13 @@ static enum microcode_match_result compare_patch(
>>   * return 1 - found update
>>   * return < 0 - error
>>   */
>> -static int get_matching_microcode(const void *mc, unsigned int cpu)
>> +static int get_matching_microcode(const void *mc)
>>  {
>>  const struct microcode_header_intel *mc_header = mc;
>>  unsigned long total_size = get_totalsize(mc_header);
>>  void *new_mc = xmalloc_bytes(total_size);
>>  struct microcode_patch *new_patch = xmalloc(struct microcode_patch);
>> +unsigned int __maybe_unused cpu = smp_processor_id();
>
>The __maybe_unused is for the sole use in pr_debug()? Please
>instead use smp_processor_id() there, if so.

Will do.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode()

2019-06-10 Thread Chao Gao

On Wed, Jun 05, 2019 at 06:37:27AM -0600, Jan Beulich wrote:
 On 27.05.19 at 10:31,  wrote:
>> During late microcode update, apply_microcode() is invoked in
>> cpu_request_microcode(). To make late microcode update more reliable,
>> we want to put the apply_microcode() into stop_machine context. So
>> we split out it from cpu_request_microcode(). As a consequence,
>> apply_microcode() should be invoked explicitly in the common code.
>> 
>> Previously, apply_microcode() gets the microcode patch to be applied from
>> the microcode cache. Now, the patch is passed as a function argument and
>> a patch is cached for cpu-hotplug and cpu resuming, only after it has
>> been loaded to a cpu without any error. As a consequence, the
>> 'match_cpu' check in microcode_update_cache is removed, which otherwise
>> would fail.
>
>The "only after it has been loaded to a cpu without any error" is a
>problem, precisely for the case where ucode on the different cores
>is not in sync initially. I would actually like to put up this question:
>When a core has no ucode loaded at all yet and only strictly older
>(than loaded on some other cores) ucode is found to be available,
>whether then it wouldn't still be better to apply that ucode to
>_at least_ the cores that have none loaded yet.

Yes, it is better for this special case. And I agree to support this case.

This in v7, a patch is loaded only if its revision is newer than that
loaded to current CPU. And it is stored only if it has been loaded
successfully. But, as you described, a broken bios might puts the system
in an inconsistent state (multiple microcode revision in the system) and
furthermore in this case, if no or an older microcode update is
provided, early loading cannot get the system into sane state. So for
both early and late microcode loading, we could face a situation that
the patch to be loaded has equal or old revision than microcode of some
CPUs.

Changes I plan to make in next version are:
1. For early microcode, a patch would be stored if it covers current CPU's
signature. All CPUs would try to load from the cache.
2. For late microcode, a patch is loaded only if its revision is newer than
*the patch cached*. And it is stored only if has been loaded without an
"EIO" error.
3. Cache replacement remains the same.

But it is a temperary solution, especially for CSPs. A better way
might be getting the newest ucode or upgrading to the newest bios,
even downgrading the bios to an older version which wouldn't put
the system into "insane" state.

>
>To get the system into "sane" state it may even be necessary to
>downgrade ucode on the cores which did have it loaded already,
>in such a situation.
>
>> On AMD side, svm_host_osvw_init() is supposed to be called after
>> microcode update. As apply_micrcode() won't be called by
>> cpu_request_microcode() now, svm_host_osvw_init() is moved to the
>> end of apply_microcode().
>
>I guess this really ought to become a vendor hook as well, but I
>wouldn't insist on you doing so here.
>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -253,7 +253,7 @@ static int enter_state(u32 state)
>>  
>>  console_end_sync();
>>  
>> -microcode_resume_cpu();
>> +early_microcode_update_cpu();
>
>The use here, the (changed) use in start_secondary(), and the dropping
>of its __init suggest to make an attempt to find a better name for the
>function. Maybe microcode_update_one()?

Will do.

>> +/*
>> + * Load a microcode update to current CPU.
>> + *
>> + * If no patch is provided, the cached patch will be loaded. Microcode 
>> update
>> + * during APs bringup and CPU resuming falls into this case.
>> + */
>> +static int microcode_update_cpu(struct microcode_patch *patch)
>
>const?
>
>>  {
>> -int err;
>> -struct cpu_signature *sig = &this_cpu(cpu_sig);
>> +int ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>>  
>> -if ( !microcode_ops )
>> -return 0;
>> +if ( unlikely(ret) )
>> +return ret;
>>  
>>  spin_lock(µcode_mutex);
>>  
>> -err = microcode_ops->collect_cpu_info(sig);
>> -if ( likely(!err) )
>> -err = microcode_ops->apply_microcode();
>> -spin_unlock(µcode_mutex);
>> +if ( patch )
>> +{
>> +/*
>> + * If a patch is specified, it should has newer revision than
>> + * that of the patch cached.
>> + */
>> +if ( microcode_cache &&
>> + microcode_ops->compare_patch(patch, microcode_cache) != 
>> NEW_UCODE )
>> +{
>> +spin_unlock(µcode_mutex);
>> +return -EINVAL;
>> +}
>>  
>> -return err;
>> -}
>> +ret = microcode_ops->apply_microcode(patch);
>
>There's no printk() here but ...
>
>> +}
>> +else if ( microcode_cache )
>> +{
>> +ret = microcode_ops->apply_microcode(microcode_cache);
>> +if ( ret == -EIO )
>> +printk("Update failed. Reboot needed\n");
>
>... you emit a log message h

Re: [Xen-devel] [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode()

2019-06-11 Thread Chao Gao

On Tue, Jun 11, 2019 at 01:08:36AM -0600, Jan Beulich wrote:
 On 11.06.19 at 05:32,  wrote:
>> On Wed, Jun 05, 2019 at 06:37:27AM -0600, Jan Beulich wrote:
>> On 27.05.19 at 10:31,  wrote:
 During late microcode update, apply_microcode() is invoked in
 cpu_request_microcode(). To make late microcode update more reliable,
 we want to put the apply_microcode() into stop_machine context. So
 we split out it from cpu_request_microcode(). As a consequence,
 apply_microcode() should be invoked explicitly in the common code.
 
 Previously, apply_microcode() gets the microcode patch to be applied from
 the microcode cache. Now, the patch is passed as a function argument and
 a patch is cached for cpu-hotplug and cpu resuming, only after it has
 been loaded to a cpu without any error. As a consequence, the
 'match_cpu' check in microcode_update_cache is removed, which otherwise
 would fail.
>>>
>>>The "only after it has been loaded to a cpu without any error" is a
>>>problem, precisely for the case where ucode on the different cores
>>>is not in sync initially. I would actually like to put up this question:
>>>When a core has no ucode loaded at all yet and only strictly older
>>>(than loaded on some other cores) ucode is found to be available,
>>>whether then it wouldn't still be better to apply that ucode to
>>>_at least_ the cores that have none loaded yet.
>> 
>> Yes, it is better for this special case. And I agree to support this case.
>> 
>> This in v7, a patch is loaded only if its revision is newer than that
>> loaded to current CPU. And it is stored only if it has been loaded
>> successfully. But, as you described, a broken bios might puts the system
>> in an inconsistent state (multiple microcode revision in the system) and
>> furthermore in this case, if no or an older microcode update is
>> provided, early loading cannot get the system into sane state. So for
>> both early and late microcode loading, we could face a situation that
>> the patch to be loaded has equal or old revision than microcode of some
>> CPUs.
>> 
>> Changes I plan to make in next version are:
>> 1. For early microcode, a patch would be stored if it covers current CPU's
>> signature. All CPUs would try to load from the cache.
>> 2. For late microcode, a patch is loaded only if its revision is newer than
>> *the patch cached*. And it is stored only if has been loaded without an
>> "EIO" error.
>> 3. Cache replacement remains the same.
>
>Why the difference between early and late loading?

Storing a patch without loading it is problematic. We need complex logics
to restore the old patch if the current patch is proved to be broken.
I really want to avoid going this way. So for late microcode, we still
stick to the rule: storing a patch only after it has been loaded. For
late loading, we can try to load a patch as long as the patch covers
current cpu signature to avoid missing any possible update. But thanks
to early loading, the oldest microcode revision on all online CPUs
shouldn't be older than the cache. So as an optimization, we initiate an
update system-wide only if the patch's revision is newer than the cache.

For early loading, to avoid discarding a potential useful patch, an
exception is made to store the newest matching patch without loading it
and all CPus try to load the patch. One problem is if a broken patch
with very high revision is provided, any subsequent attempt of late
loading would fail. It is unlikely to happen, so I plan to leave it
aside. Otherwise, we can clean up the cache in microcode_init() if no
cpu has loaded this patch (we need a global variable to track the status).

>
>> But it is a temperary solution, especially for CSPs. A better way
>> might be getting the newest ucode or upgrading to the newest bios,
>> even downgrading the bios to an older version which wouldn't put
>> the system into "insane" state.
>
>On the quoted system, all BIOS versions I've ever been provided
>had the same odd behavior.
>
 +static int microcode_update_cpu(struct microcode_patch *patch)
  {
 -int err;
 -struct cpu_signature *sig = &this_cpu(cpu_sig);
 +int ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
  
 -if ( !microcode_ops )
 -return 0;
 +if ( unlikely(ret) )
 +return ret;
  
  spin_lock(µcode_mutex);
  
 -err = microcode_ops->collect_cpu_info(sig);
 -if ( likely(!err) )
 -err = microcode_ops->apply_microcode();
 -spin_unlock(µcode_mutex);
 +if ( patch )
 +{
 +/*
 + * If a patch is specified, it should has newer revision than
 + * that of the patch cached.
 + */
 +if ( microcode_cache &&
 + microcode_ops->compare_patch(patch, microcode_cache) != 
 NEW_UCODE )
 +{
 +spin_unlock(µcode_mutex);
 +r

Re: [Xen-devel] [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading

2019-06-11 Thread Chao Gao

On Wed, Jun 05, 2019 at 08:09:43AM -0600, Jan Beulich wrote:
>>>> On 27.05.19 at 10:31,  wrote:
>> This patch ports microcode improvement patches from linux kernel.
>> 
>> Before you read any further: the early loading method is still the
>> preferred one and you should always do that. The following patch is
>> improving the late loading mechanism for long running jobs and cloud use
>> cases.
>> 
>> Gather all cores and serialize the microcode update on them by doing it
>> one-by-one to make the late update process as reliable as possible and
>> avoid potential issues caused by the microcode update.
>> 
>> Signed-off-by: Chao Gao 
>> Tested-by: Chao Gao 
>> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>> Cc: Kevin Tian 
>> Cc: Jun Nakajima 
>> Cc: Ashok Raj 
>> Cc: Borislav Petkov 
>> Cc: Thomas Gleixner 
>> Cc: Andrew Cooper 
>> Cc: Jan Beulich 
>> ---
>> Changes in v7:
>>  - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
>>  - reword the comment above microcode_update_cpu() to clearly state that
>>  one thread per core should do the update.
>> 
>> Changes in v6:
>>  - Use one timeout period for rendezvous stage and another for update stage.
>>  - scale time to wait by the number of remaining cpus to respond.
>>It helps to find something wrong earlier and thus we can reboot the
>>system earlier.
>> ---
>>  xen/arch/x86/microcode.c | 171 
>> ++-
>>  1 file changed, 155 insertions(+), 16 deletions(-)
>> 
>> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> index 23cf550..f4a417e 100644
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -22,6 +22,7 @@
>>   */
>>  
>>  #include 
>> +#include 
>
>It seems vanishingly unlikely that you would need this explicit #include
>here, but it certainly isn't wrong.
>
>> @@ -270,31 +296,90 @@ bool microcode_update_cache(struct microcode_patch 
>> *patch)
>>  return true;
>>  }
>>  
>> -static long do_microcode_update(void *patch)
>> +/* Wait for CPUs to rendezvous with a timeout (us) */
>> +static int wait_for_cpus(atomic_t *cnt, unsigned int expect,
>> + unsigned int timeout)
>>  {
>> -int error, cpu;
>> -
>> -error = microcode_update_cpu(patch);
>> -if ( error )
>> +while ( atomic_read(cnt) < expect )
>>  {
>> -microcode_ops->free_patch(microcode_cache);
>> -return error;
>> +if ( !timeout )
>> +{
>> +printk("CPU%d: Timeout when waiting for CPUs calling in\n",
>> +   smp_processor_id());
>> +return -EBUSY;
>> +}
>> +udelay(1);
>> +timeout--;
>>  }
>
>There's no comment here and nothing in the description: I don't
>recall clarification as to whether RDTSC is fine to be issued by a
>thread when ucode is being updated by another thread on the
>same core.

Yes. I think it is fine.

Ashok, could you share your opinion on this question?

>
>> +static int do_microcode_update(void *patch)
>> +{
>> +unsigned int cpu = smp_processor_id();
>> +unsigned int cpu_nr = num_online_cpus();
>> +unsigned int finished;
>> +int ret;
>> +static bool error;
>>  
>> -microcode_update_cache(patch);
>> +atomic_inc(&cpu_in);
>> +ret = wait_for_cpus(&cpu_in, cpu_nr, MICROCODE_CALLIN_TIMEOUT_US);
>> +if ( ret )
>> +return ret;
>>  
>> -return error;
>> +ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>> +/*
>> + * Load microcode update on only one logical processor per core.
>> + * Here, among logical processors of a core, the one with the
>> + * lowest thread id is chosen to perform the loading.
>> + */
>> +if ( !ret && (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
>
>At the very least it's not obvious whether this hyper-threading-centric
>view ("logical processor") also applies to AMD's compute unit model
>(which reuses cpu_sibling_mask). It does, as the respective MSRs are
>per-compute-unit rather than per-core, but I'd appreciate if the
>wording could be adjusted to explicitly name both cases (multiple
>threads per core and multiple

Re: [Xen-devel] [PATCH v7 09/10] microcode: remove microcode_update_lock

2019-06-11 Thread Chao Gao

On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote:
>>>> On 27.05.19 at 10:31,  wrote:
>> microcode_update_lock is to prevent logic threads of a same core from
>> updating microcode at the same time. But due to using a global lock, it
>> also prevented parallel microcode updating on different cores.
>> 
>> Remove this lock in order to update microcode in parallel. It is safe
>> because we have already ensured serialization of sibling threads at the
>> caller side.
>> 1.For late microcode update, do_microcode_update() ensures that only one
>>   sibiling thread of a core can update microcode.
>> 2.For microcode update during system startup or CPU-hotplug,
>>   microcode_mutex() guarantees update serialization of logical threads.
>> 3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
>>   late microcode update.
>> 
>> Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
>> only) are still processed sequentially.
>> 
>> Signed-off-by: Chao Gao 
>
>Reviewed-by: Jan Beulich 

Thanks.

>
>> ---
>> Changes in v7:
>>  - reworked. Remove complex lock logics introduced in v5 and v6. The 
>> microcode
>>  patch to be applied is passed as an argument without any global variable. 
>> Thus
>>  no lock is added to serialize potential readers/writers. Callers of
>>  apply_microcode() will guarantee the correctness: the patch poninted by the
>>  arguments won't be changed by others.
>
>Much better this way indeed.
>
>> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch 
>> *patch)
>>  
>>  mc_intel = patch->mc_intel;
>>  
>> -/* serialize access to the physical write to MSR 0x79 */
>> -spin_lock_irqsave(µcode_update_lock, flags);
>> +BUG_ON(local_irq_is_enabled());
>>  
>>  /*
>>   * Writeback and invalidate caches before updating microcode to avoid
>
>Thinking about it - what happens if we hit an NMI or #MC here?
>watchdog_disable(), a call to which you add in an earlier patch,
>doesn't really suppress the generation of NMIs, it only tells the
>handler not to look at the accumulated statistics.

I think they should be suppressed. Ashok, could you confirm it?

I will figure out how to suppress them in Xen.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 10/10] x86/microcode: always collect_cpu_info() during boot

2019-06-11 Thread Chao Gao

On Wed, Jun 05, 2019 at 09:05:49AM -0600, Jan Beulich wrote:
>>>> On 27.05.19 at 10:31,  wrote:
>> From: Sergey Dyasli 
>> 
>> Currently cpu_sig struct is not updated during boot when either:
>> 
>> 1. ucode_scan is set to false (e.g. no "ucode=scan" in cmdline)
>> 2. initrd does not contain a microcode blob
>
>I thought we'd already discussed this - "ucode=" is not
>covered by this.
>
>> These will result in cpu_sig.rev being 0 which affects APIC's
>> check_deadline_errata() and retpoline_safe() functions.
>> 
>> Fix this by getting ucode revision early during boot and SMP bring up.
>> While at it.
>
>While at it?
>
>> Signed-off-by: Sergey Dyasli 
>> Signed-off-by: Chao Gao 
>> ---
>> changes in v7:
>> - rebase on patch 1~9
>
>From the looks of it this doesn't depend on any of the earlier changes
>(except the ucode_cpu_info -> cpu_sig change), and hence could go
>in right away. Am I overlooking something? If not, all that's needed
>would be clarifications of the description as per above.

I think no. Will send this patch separately.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 10/10] x86/microcode: always collect_cpu_info() during boot

2019-06-11 Thread Chao Gao

On Wed, Jun 05, 2019 at 04:56:01PM +0200, Roger Pau Monné wrote:
>On Mon, May 27, 2019 at 04:31:31PM +0800, Chao Gao wrote:
>> From: Sergey Dyasli 
>> 
>> Currently cpu_sig struct is not updated during boot when either:
>> 
>> 1. ucode_scan is set to false (e.g. no "ucode=scan" in cmdline)
>> 2. initrd does not contain a microcode blob
>> 
>> These will result in cpu_sig.rev being 0 which affects APIC's
>> check_deadline_errata() and retpoline_safe() functions.
>> 
>> Fix this by getting ucode revision early during boot and SMP bring up.
>> While at it.
>
>I don't understand the last "While at it" sentence. Can it be
>removed?

Yes.

>
>Is this an issue with current code? If so this could be merged ahead of
>the rest of the series, and should likely be patch 1.
>
>OTOH if the issue this patch is fixing is introduced by this series
>please merge the fix with the respective patch that introduced the
>bug.

It is the former. Will send it separately.
Really appreciate your other comments.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] an assertion triggered when running Xen on a HSW desktop

2019-01-15 Thread Chao Gao

The output of lscpu is:

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:2
Core(s) per socket:4
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 60
Stepping:  3
CPU MHz:   3528.421
BogoMIPS:  7183.36
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  8192K
NUMA node0 CPU(s): 0-7

Serial console output is:

(XEN) Xen version 4.12-unstable (root@) (gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 
7.3.0) debug=y  Tue Jan 15 07:25:29 UTC 2019
(XEN) Latest ChangeSet: Mon Dec 17 09:22:59 2018 + git:a5b0eb3636
(XEN) Console output is synchronous.
(XEN) Bootloader: GRUB 2.02-2ubuntu8.2
(XEN) Command line: dom0=pvh iommu=no-intremap console=com1 com1=115200,8n1 
sync_console noreboot=true placeholder
(XEN) Xen image load base address: 0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009d800 (usable)
(XEN)  0009d800 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - cee8c000 (usable)
(XEN)  cee8c000 - cee93000 (ACPI NVS)
(XEN)  cee93000 - cf2c9000 (usable)
(XEN)  cf2c9000 - cf76 (reserved)
(XEN)  cf76 - d7eeb000 (usable)
(XEN)  d7eeb000 - d800 (reserved)
(XEN)  d800 - d876 (usable)
(XEN)  d876 - d880 (reserved)
(XEN)  d880 - d8fae000 (usable)
(XEN)  d8fae000 - d900 (ACPI data)
(XEN)  d900 - da71c000 (usable)
(XEN)  da71c000 - da80 (ACPI NVS)
(XEN)  da80 - dbe11000 (usable)
(XEN)  dbe11000 - dc00 (reserved)
(XEN)  dd00 - df20 (reserved)
(XEN)  f800 - fc00 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed0 - fed04000 (reserved)
(XEN)  fed1c000 - fed2 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  ff00 - 0001 (reserved)
(XEN)  0001 - 00021ee0 (usable)
(XEN) New Xen image base address: 0xdb80
(XEN) ACPI: RSDP 000F0490, 0024 (r2 DELL  )
(XEN) ACPI: XSDT D8FEE088, 0094 (r1 DELLCBX3 1072009 AMI 10013)
(XEN) ACPI: FACP D8FF9B30, 010C (r5 DELLCBX3 1072009 AMI 10013)
(XEN) ACPI: DSDT D8FEE1B0, B97E (r2 DELLCBX3  14 INTL 20091112)
(XEN) ACPI: FACS DA7FE080, 0040
(XEN) ACPI: APIC D8FF9C40, 0092 (r3 DELLCBX3 1072009 AMI 10013)
(XEN) ACPI: FPDT D8FF9CD8, 0044 (r1 DELLCBX3 1072009 AMI 10013)
(XEN) ACPI: SLIC D8FF9D20, 0176 (r3 DELLCBX3 1072009 MSFT10013)
(XEN) ACPI: LPIT D8FF9E98, 005C (r1 DELLCBX3   0 AMI.5)
(XEN) ACPI: SSDT D8FF9EF8, 0539 (r1  PmRef  Cpu0Ist 3000 INTL 20120711)
(XEN) ACPI: SSDT D8FFA438, 0AD8 (r1  PmRefCpuPm 3000 INTL 20120711)
(XEN) ACPI: SSDT D8FFAF10, 01C7 (r1  PmRef LakeTiny 3000 INTL 20120711)
(XEN) ACPI: HPET D8FFB0D8, 0038 (r1 DELLCBX3 1072009 AMI.5)
(XEN) ACPI: SSDT D8FFB110, 036D (r1 SataRe SataTabl 1000 INTL 20120711)
(XEN) ACPI: MCFG D8FFB480, 003C (r1 DELLCBX3 1072009 MSFT   97)
(XEN) ACPI: SSDT D8FFB4C0, 34D6 (r1 SaSsdt  SaSsdt  3000 INTL 20091112)
(XEN) ACPI: ASF! D8FFE998, 00A5 (r32 INTEL   HCG1 TFSMF4240)
(XEN) ACPI: DMAR D8FFEA40, 00B8 (r1 INTEL  HSW 1 INTL1)
(XEN) System RAM: 8100MB (8294548kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at -00021ee0
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 60 (0x3c), Stepping 3 (raw 
000306c3)
(XEN) found SMP MP-table at 000fd970
(XEN) DMI 2.7 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x1808 (32 bits)
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - da7fe080/, 
using 32
(XEN) ACPI: wakeup_vec[da7fe08c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lap

Re: [Xen-devel] an assertion triggered when running Xen on a HSW desktop

2019-01-15 Thread Chao Gao

On Tue, Jan 15, 2019 at 09:18:25AM +0100, Roger Pau Monné wrote:
>On Tue, Jan 15, 2019 at 04:04:40PM +0800, Chao Gao wrote:
>[...]
>> (XEN) Xen version 4.12-unstable (root@) (gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 
>> 7.3.0) debug=y  Tue Jan 15 07:25:29 UTC 2019
>> (XEN) Latest ChangeSet: Mon Dec 17 09:22:59 2018 + git:a5b0eb3636
>[...]
>> (XEN) *** Building a PVH Dom0 ***
>> (XEN) Assertion 'IS_ALIGNED(dfn_x(dfn), (1ul << page_order))' failed at 
>> iommu.c:323
>> (XEN) [ Xen-4.12-unstable  x86_64  debug=y   Tainted:  C   ]
>> (XEN) CPU:0
>> (XEN) RIP:e008:[] iommu_map+0xba/0x176
>> (XEN) RFLAGS: 00010202   CONTEXT: hypervisor
>> (XEN) rax:    rbx: 0007   rcx: 0003
>> (XEN) rdx: 0020f081   rsi: 0001   rdi: 830215242000
>> (XEN) rbp: 82d080497bb8   rsp: 82d080497b58   r8:  
>> (XEN) r9:  82d080497bd4   r10: 0180   r11: 7fff
>> (XEN) r12: 830215242000   r13:    r14: 0001
>> (XEN) r15: 0001   cr0: 8005003b   cr4: 001526e0
>> (XEN) cr3: dbc8d000   cr2: 
>> (XEN) fsb:    gsb:    gss: 
>> (XEN) ds:    es:    fs:    gs:    ss:    cs: e008
>> (XEN) Xen code around  (iommu_map+0xba/0x176):
>> (XEN)  41 89 c5 e9 a2 00 00 00 <0f> 0b 0f 0b 41 89 c5 41 80 bc 24 c0 01 00 
>> 00 00
>> (XEN) Xen stack trace from rsp=82d080497b58:
>> (XEN)0020  82d080497ba8 82d08023d489
>> (XEN)0001 82e0041e1000 830215242000 82e0041e1020
>> (XEN)830215242000  0001 0001
>> (XEN)82d080497c08 82d0804182d8 82d080497c08 0001
>> (XEN)009d  0001 82d080444c68
>> (XEN)0020b43e 830215242000 82d080497d58 82d08043716c
>> (XEN)82d080497fff 0001 82d08046cbc0 8309bf40
>> (XEN)01e33000 8309bf30 830213525000 0010b43e
>> (XEN)82db 01f4 0010 001f15242000
>> (XEN)00ff82d080497cb8 82d08020a8e4 82d0805cf02c 82d08048f740
>> (XEN)0092 82d08023dddb 82d080497fff 82d08048f740
>> (XEN)82d080497cc8 82d08023de2e 82d080497ce8 82d08024016b
>> (XEN)82d080497ce8 82d08023de78 82d080497d08 82d080240205
>> (XEN)82d0805a3880 82d0805a3880 82d080497d48 82d08023d489
>> (XEN)8302152e1550 8309bf30 01e33000 8309bf30
>> (XEN)01e33000 8309bf40 82d08046cbc0 830215242000
>> (XEN)82d080497d98 82d08043e53c 82d080497d98 82d08046cbc0
>> (XEN)8302152e1550 0001 82d0805d00d0 0008
>> (XEN)82d080497ee8 82d08042d8ef  0002
>> (XEN)0002 0002 0002 0002
>> (XEN) Xen call trace:
>> (XEN)[] iommu_map+0xba/0x176
>> (XEN)[] iommu_hwdom_init+0xef/0x220
>> (XEN)[] dom0_construct_pvh+0x189/0x129e
>> (XEN)[] construct_dom0+0xd4/0xb14
>> (XEN)[] __start_xen+0x2710/0x2830
>> (XEN)[] __high_start+0x53/0x55
>> (XEN) 
>> (XEN) 
>> (XEN) 
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'IS_ALIGNED(dfn_x(dfn), (1ul << page_order))' failed at 
>> iommu.c:323
>> (XEN) 
>
>Oh, this was added by Paul quite recently. You seem to be using a
>rather old commit (a5b0eb3636), is there any reason for using such an
>old baseline?

I was using the master branch. Your patch below did fix this issue.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 3/3] xen/pt: initialize 'warned' field of arch_msix properly

2019-01-16 Thread Chao Gao

Also clean up current code by moving initialization of arch specific
fields out of common code.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
Changes in v5:
 - rename init_arch_msix to arch_init_msix
 - place arch_init_msix right after the definition of arch_msix

Changes in v4:
 - newly added
---
 xen/drivers/passthrough/pci.c | 2 +-
 xen/include/asm-x86/msi.h | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4f2be02..95fc06b 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -367,7 +367,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 
bus, u8 devfn)
 xfree(pdev);
 return NULL;
 }
-spin_lock_init(&msix->table_lock);
+arch_init_msix(msix);
 pdev->msix = msix;
 }
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 10387dc..7b13c07 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -242,6 +242,12 @@ struct arch_msix {
 domid_t warned;
 };
 
+static inline void arch_init_msix(struct arch_msix *msix)
+{
+spin_lock_init(&msix->table_lock);
+msix->warned = DOMID_INVALID;
+}
+
 void early_msi_init(void);
 void msi_compose_msg(unsigned vector, const cpumask_t *mask,
  struct msi_msg *msg);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest

2019-01-16 Thread Chao Gao

When I destroyed a guest with 'xl destroy', I found the warning
in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
to that place, I got the call trace below:

(XEN) Xen call trace:
(XEN)[] msi.c#msi_set_mask_bit+0x1da/0x29b
(XEN)[] guest_mask_msi_irq+0x1c/0x1e
(XEN)[] vmsi.c#msixtbl_write+0x173/0x1d4
(XEN)[] vmsi.c#_msixtbl_write+0x16/0x18
(XEN)[] hvm_process_io_intercept+0x216/0x270
(XEN)[] hvm_io_intercept+0x27/0x4c
(XEN)[] emulate.c#hvmemul_do_io+0x273/0x454
(XEN)[] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
(XEN)[] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
(XEN)[] emulate.c#linear_write+0xdd/0x13b
(XEN)[] emulate.c#hvmemul_write+0xbd/0xf1
(XEN)[] x86_emulate+0x2249d/0x23c5c
(XEN)[] x86_emulate_wrapper+0x2b/0x5f
(XEN)[] emulate.c#_hvm_emulate_one+0x54/0x1b2
(XEN)[] hvm_emulate_one+0x10/0x12
(XEN)[] hvm_emulate_one_insn+0x42/0x14a
(XEN)[] handle_mmio_with_translation+0x4f/0x51
(XEN)[] hvm_hap_nested_page_fault+0x16c/0x6d8
(XEN)[] vmx_vmexit_handler+0x19b0/0x1f2e
(XEN)[] vmx_asm_vmexit_handler+0xfa/0x270

It seems to me that guest is trying to mask a msi while the memory decoding
of the device is disabled. Performing a device reset without proper method
to avoid guest's MSI-X operation would lead to this issue.

The fix is basic - detach pci device before resetting the device.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 
---
 tools/libxl/libxl_pci.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..855fb71 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1459,17 +1459,17 @@ skip1:
 fclose(f);
 }
 out:
-/* don't do multiple resets while some functions are still passed through 
*/
-if ( (pcidev->vdevfn & 0x7) == 0 ) {
-libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, 
pcidev->func);
-}
-
 if (!isstubdom) {
 rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
 if (rc < 0 && (hvm || errno != ENOSYS))
 LOGED(ERROR, domainid, "xc_deassign_device failed");
 }
 
+/* don't do multiple resets while some functions are still passed through 
*/
+if ( (pcidev->vdevfn & 0x7) == 0 ) {
+libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, 
pcidev->func);
+}
+
 stubdomid = libxl_get_stubdom_id(ctx, domid);
 if (stubdomid != 0) {
 libxl_device_pci pcidev_s = *pcidev;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-16 Thread Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
->free_domain_pirqs
->unmap_domain_pirq (if pirq isn't unmapped by qemu)
->pirq_guest_force_unbind
->__pirq_guest_unbind
->mask_msi_irq(=desc->handler->disable())
->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao 
---
Changes in v5:
 - fix the potential infinite loop
 - assert that unmap_domain_pirq() won't fail
 - assert msi_list is empty after the loop in pci_unmap_msi
 - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
   compilation error when building PVShim

Changes in v4:
 - split out change to 'msix->warned' field
 - handle multiple msi cases
 - use list_first_entry_or_null to traverse 'pdev->msi_list'
---
 xen/drivers/passthrough/io.c  | 57 ++
 xen/drivers/passthrough/pci.c | 64 +++
 xen/include/xen/iommu.h   |  4 +++
 3 files changed, 107 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index a6eb8a4..56ee1ef 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -619,6 +619,42 @@ int pt_irq_create_bind(
 return 0;
 }
 
+static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
+{
+struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+ASSERT(spin_is_locked(&d->event_lock));
+
+if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
+ list_empty(&pirq_dpci->digl_list) )
+{
+pirq_guest_unbind(d, pirq);
+msixtbl_pt_unregister(d, pirq);
+if ( pt_irq_need_timer(pirq_dpci->flags) )
+kill_timer(&pirq_dpci->timer);
+pirq_dpci->flags = 0;
+/*
+ * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
+ * call to pt_pirq_softirq_reset.
+ */
+pt_pirq_softirq_reset(pirq_dpci);
+
+pirq_cleanup_check(pirq, d);
+}
+}
+
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+ASSERT(spin_is_locked(&d->event_lock));
+
+if ( pirq_dpci && pirq_dpci->gmsi.posted )
+pi_update_irte(NULL, pirq, 0);
+
+pt_irq_destroy_bind_common(d, pirq);
+}
+
 int pt_irq_destroy_bind(
 struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
 }
 else
 what = "bogus";
-}
-else if ( pirq_dpci && pirq_dpci->gmsi.posted )
-pi_update_irte(NULL, pirq, 0);
-
-if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
- list_empty(&pirq_dpci->digl_list) )
-{
-pirq_guest_unbind(d, pirq);
-msixtbl_pt_unregister(d, pirq);
-if ( pt_irq_need_timer(pirq_dpci->flags) )
-kill_timer(&pirq_dpci->timer);
-pirq_dpci->flags = 0;
-/*
- * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
- * call to pt_pirq_softirq_reset.
- */
-pt_pirq_softirq_reset(pirq_dpci);
 
-pirq_cleanup_check(pirq, d);
+pt_irq_destroy_bind_common(d, pirq);
 }
+else
+pt_irq_destroy_bind_msi(d, pirq);
 
 spin_unlock(&d->event_lock);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 93c20b9..4f2be02 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 
bu

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-16 Thread Chao Gao

On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>> ->free_domain_pirqs
>> ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>> ->pirq_guest_force_unbind
>> ->__pirq_guest_unbind
>> ->mask_msi_irq(=desc->handler->disable())
>> ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all 
>> established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: 
>> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
>Thanks, I think the approach is fine, just a couple of comments.
>
>> Signed-off-by: Chao Gao 
>> ---
>> Changes in v5:
>>  - fix the potential infinite loop
>>  - assert that unmap_domain_pirq() won't fail
>>  - assert msi_list is empty after the loop in pci_unmap_msi
>>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>>compilation error when building PVShim
>> 
>> Changes in v4:
>>  - split out change to 'msix->warned' field
>>  - handle multiple msi cases
>>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
>> ---
>>  xen/drivers/passthrough/io.c  | 57 ++
>>  xen/drivers/passthrough/pci.c | 64 
>> +++
>>  xen/include/xen/iommu.h   |  4 +++
>>  3 files changed, 107 insertions(+), 18 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index a6eb8a4..56ee1ef 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>>  return 0;
>>  }
>>  
>> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> +{
>> +struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> + list_empty(&pirq_dpci->digl_list) )
>> +{
>> +pirq_guest_unbind(d, pirq);
>> +msixtbl_pt_unregister(d, pirq);
>> +if ( pt_irq_need_timer(pirq_dpci->flags) )
>> +kill_timer(&pirq_dpci->timer);
>> +pirq_dpci->flags = 0;
>> +/*
>> + * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> + * call to pt_pirq_softirq_reset.
>> + */
>> +pt_pirq_softirq_reset(pirq_dpci);
>> +
>> +pirq_cleanup_check(pirq, d);
>> +}
>> +}
>> +
>> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> +{
>> +struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>
>const
>
>> +
>> +ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> +pi_update_irte(NULL, pirq, 0);
>> +
>> +pt_irq_destroy_bind_common(d, pirq);
>> +}
>> +
>>  int pt_irq_destroy_bind(
>>  struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>>  }
>>

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-16 Thread Chao Gao

On Wed, Jan 16, 2019 at 01:34:28PM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 07:59:44PM +0800, Chao Gao wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> >> index 93c20b9..4f2be02 100644
>> >> --- a/xen/drivers/passthrough/pci.c
>> >> +++ b/xen/drivers/passthrough/pci.c
>> >> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 
>> >> seg, u8 bus, u8 devfn, u32 flag)
>> >>  return rc;
>> >>  }
>> >>  
>> >> +/*
>> >> + * Unmap established mappings between domain's pirq and device's MSI.
>> >> + * These mappings were set up by qemu/guest and are expected to be
>> >> + * destroyed when changing the device's ownership.
>> >> + */
>> >> +static void pci_unmap_msi(struct pci_dev *pdev)
>> >> +{
>> >> +struct msi_desc *entry, *tmp;
>> >> +struct domain *d = pdev->domain;
>> >> +
>> >> +ASSERT(pcidevs_locked());
>> >> +ASSERT(d);
>> >> +
>> >> +spin_lock(&d->event_lock);
>> >> +list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
>> >> +{
>> >> +struct pirq *info;
>> >> +int ret, pirq = 0;
>> >> +unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> >> +  ? entry->msi.nvec : 1;
>> >
>> >I think you should mask the entry, like it's done in
>> >pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
>> >consistent state between bind and unbind.
>> 
>> I don't think it is necessary considering that we are to unmap pirq.
>> The reason of keeping state consistent is that we might try to bind
>> the same pirq to another guest interrupt.
>
>Even taking into account that the pirq will be unmapped afterwards I'm
>not sure the state is going to be the same. unmap_domain_pirq doesn't
>seem to mask the MSI entries, and so I wonder whether we could run
>into issues (state not being the expected) when later re-assigning the
>device to another guest.

A valid call trace (in this patch's description) is:

->unmap_domain_pirq (if pirq isn't unmapped by qemu)
->pirq_guest_force_unbind
->__pirq_guest_unbind
->mask_msi_irq(=desc->handler->disable())
->the warning in msi_set_mask_bit()

>
>Maybe I'm missing something, but I would like to make sure the device
>state stays consistent between assignations, at the end of day the
>problem this patch aims to solve is a state inconsistency between
>device assignations.
>
>> >> +}
>> >> +}
>> >> +/*
>> >> + * All pirq-s should have been unmapped and corresponding msi_desc
>> >> + * entries should have been removed in the above loop.
>> >> + */
>> >> +ASSERT(list_empty(&pdev->msi_list));
>> >> +
>> >> +spin_unlock(&d->event_lock);
>> >> +}
>> >> +
>> >>  /* caller should hold the pcidevs_lock */
>> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>  {
>> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 
>> >> bus, u8 devfn)
>> >>  if ( !pdev )
>> >>  return -ENODEV;
>> >>  
>> >> +pci_unmap_msi(pdev);
>> >
>> >Just want to make sure, since deassign_device will be called for both
>> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>> >device is assigned to a PV guest, but would like your confirmation.
>> 
>> TBH, I don't know how device pass-thru is implemented for PV guest.
>> If PV guest also uses the same structures and APIs to manage the mapping
>> between msi, pirq and guest interrupt, I think pci_unmap_msi() should also
>> work for PV guest case.
>
>No, PV guest uses a completely different mechanism. I think
>pci_unmap_msi is safe to be used against PV guests, but it would be
>nice to have some confirmation. The more that there are no
>pci-passthorugh tests in osstest, so such error would go unnoticed.

I will do some tests for PV guest.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-21 Thread Chao Gao

On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>> ->free_domain_pirqs
>> ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>> ->pirq_guest_force_unbind
>> ->__pirq_guest_unbind
>> ->mask_msi_irq(=desc->handler->disable())
>> ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all 
>> established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: 
>> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
>Thanks, I think the approach is fine, just a couple of comments.
>
>> Signed-off-by: Chao Gao 
>> ---
>> Changes in v5:
>>  - fix the potential infinite loop
>>  - assert that unmap_domain_pirq() won't fail
>>  - assert msi_list is empty after the loop in pci_unmap_msi
>>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>>compilation error when building PVShim
>> 
>> Changes in v4:
>>  - split out change to 'msix->warned' field
>>  - handle multiple msi cases
>>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
>> ---
>>  xen/drivers/passthrough/io.c  | 57 ++
>>  xen/drivers/passthrough/pci.c | 64 
>> +++
>>  xen/include/xen/iommu.h   |  4 +++
>>  3 files changed, 107 insertions(+), 18 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index a6eb8a4..56ee1ef 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>>  return 0;
>>  }
>>  
>> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> +{
>> +struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> + list_empty(&pirq_dpci->digl_list) )
>> +{
>> +pirq_guest_unbind(d, pirq);
>> +msixtbl_pt_unregister(d, pirq);
>> +if ( pt_irq_need_timer(pirq_dpci->flags) )
>> +kill_timer(&pirq_dpci->timer);
>> +pirq_dpci->flags = 0;
>> +/*
>> + * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> + * call to pt_pirq_softirq_reset.
>> + */
>> +pt_pirq_softirq_reset(pirq_dpci);
>> +
>> +pirq_cleanup_check(pirq, d);
>> +}
>> +}
>> +
>> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> +{
>> +struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>
>const
>
>> +
>> +ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> +pi_update_irte(NULL, pirq, 0);
>> +
>> +pt_irq_destroy_bind_common(d, pirq);
>> +}
>> +
>>  int pt_irq_destroy_bind(
>>  struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>>  }
>>

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-22 Thread Chao Gao

On Tue, Jan 22, 2019 at 01:24:48AM -0700, Jan Beulich wrote:
>>>> On 22.01.19 at 06:50,  wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 
>>>> bus, u8 devfn)
>>>>  if ( !pdev )
>>>>  return -ENODEV;
>>>>  
>>>> +pci_unmap_msi(pdev);
>>>
>>>Just want to make sure, since deassign_device will be called for both
>>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>>device is assigned to a PV guest, but would like your confirmation.
>> 
>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>> msi-x issue I want to fix.
>> 
>> With these three patches applied, I got some error messages from Xen
>> and Dom0 as follow:
>> 
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>> [ 2887.067685] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>> [ 2887.075917] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>> 
>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>> From some aspects, this error is expected. Because with this patch,
>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>> deassignment and mapping/binding pirq after deassignment should fail.
>> 
>> So what's your opinion on handling such error? We should figure out another
>> method to fix msi-x issue to avoid such error or suppress these errors in
>> qemu and linux kernel?
>
>The "forcing unbind" ones are probably fine to leave alone, but
>the errors would better be avoided in Xen (i.e. without a need
>to also change qemu and/or Linux). Since you don't really say
>when / why these errors now surface, it's hard to suggest what
>might be best to do.

With these patches applied, these errors surface in three cases:
1. destroy the PV guest with assigned devices by "xl destroy"
2. hot-unplug a assigned device from the PV guest
3. shut down the PV guest by executing "init 0" in guest (only for some
devices whose driver doesn't clean up MSI-x when shutdown)

The reason is:
when detaching a device from a domain, Toolstack always calls
xc_deassign_device() prior to libxl__device_pci_remove_xenstore().
The latter notifies xen_pciback to clean up the pci devices. I guess
unbinding and unmapping pirq are steps of the cleanup (just like
qemu's role in device deassignment for HVM guest). But in this patch,
pirqs are forcibly unmapped when calling xc_deassign_device(). Thus when
xen_pciback tries to unmap pirqs as usual, xen reports this pirq isn't
mapped and propagates this error to xen_pciback.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-22 Thread Chao Gao

On Tue, Jan 22, 2019 at 10:18:55AM +0100, Roger Pau Monné wrote:
>On Tue, Jan 22, 2019 at 01:50:20PM +0800, Chao Gao wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> >> +}
>> >> +}
>> >> +/*
>> >> + * All pirq-s should have been unmapped and corresponding msi_desc
>> >> + * entries should have been removed in the above loop.
>> >> + */
>> >> +ASSERT(list_empty(&pdev->msi_list));
>> >> +
>> >> +spin_unlock(&d->event_lock);
>> >> +}
>> >> +
>> >>  /* caller should hold the pcidevs_lock */
>> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>  {
>> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 
>> >> bus, u8 devfn)
>> >>  if ( !pdev )
>> >>  return -ENODEV;
>> >>  
>> >> +pci_unmap_msi(pdev);
>> >
>> >Just want to make sure, since deassign_device will be called for both
>> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>> >device is assigned to a PV guest, but would like your confirmation.
>> 
>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>> msi-x issue I want to fix.
>> 
>> With these three patches applied, I got some error messages from Xen
>> and Dom0 as follow:
>> 
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>> [ 2887.067685] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>> [ 2887.075917] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>> 
>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>> From some aspects, this error is expected. Because with this patch,
>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>> deassignment and mapping/binding pirq after deassignment should fail.
>
>This is quite entangled because it involves Xen, libxl and pciback.
>
>AFAICT libxl will already try to unmap the pirqs before deassigning
>the device if the domain is PV, see do_pci_remove in libxl_pci.c and
>the calls it makes to xc_physdev_unmap_pirq.

It seems it only unmaps the pirq bound to INTx.

>
>Which makes me wonder, have you tested if you see those messages about
>pirq unmap failure without this patch applied?

No such error without my patch.

>
>> So what's your opinion on handling such error? We should figure out another
>> method to fix msi-x issue to avoid such error or suppress these errors in
>> qemu and linux kernel?
>
>Regardless of the reply to the question above, I think
>unmap_domain_pirq should return ESRCH if the pirq cannot be found,
>like the patch below. That would turn the Linux kernel messages into
>less scary info messages, like:
>
>"domain %d does not have %d anymore"
>
>Which seems more accurate.

I agree with you.

Thanks
Chao


>Thanks, Roger.
>
>---8<---
>diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>index 23b4f423e6..7e9c974ba1 100644
>--- a/xen/arch/x86/irq.c
>+++ b/xen/arch/x86/irq.c
>@@ -2144,9 +2144,9 @@ int unmap_domain_pirq(struct domain *d, int pirq)
> info = pirq_info(d, pirq);
> if ( !info || (irq = info->arch.irq) <= 0 )
> {
>-dprintk(XENLOG_G_ERR, "dom%d: pirq %d not mapped\n",
>+dprintk(XENLOG_G_INFO, "dom%d: pirq %d not mapped\n",
> d->domain_id, pirq);
>-ret = -EINVAL;
>+ret = -ESRCH;
> goto done;
> }
> 
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 2/3] libxl: don't reset device when it is accessible by the guest

2019-01-25 Thread Chao Gao

When I destroyed a guest with 'xl destroy', I found the warning
in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
to that place, I got the call trace below:

(XEN) Xen call trace:
(XEN)[] msi.c#msi_set_mask_bit+0x1da/0x29b
(XEN)[] guest_mask_msi_irq+0x1c/0x1e
(XEN)[] vmsi.c#msixtbl_write+0x173/0x1d4
(XEN)[] vmsi.c#_msixtbl_write+0x16/0x18
(XEN)[] hvm_process_io_intercept+0x216/0x270
(XEN)[] hvm_io_intercept+0x27/0x4c
(XEN)[] emulate.c#hvmemul_do_io+0x273/0x454
(XEN)[] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
(XEN)[] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
(XEN)[] emulate.c#linear_write+0xdd/0x13b
(XEN)[] emulate.c#hvmemul_write+0xbd/0xf1
(XEN)[] x86_emulate+0x2249d/0x23c5c
(XEN)[] x86_emulate_wrapper+0x2b/0x5f
(XEN)[] emulate.c#_hvm_emulate_one+0x54/0x1b2
(XEN)[] hvm_emulate_one+0x10/0x12
(XEN)[] hvm_emulate_one_insn+0x42/0x14a
(XEN)[] handle_mmio_with_translation+0x4f/0x51
(XEN)[] hvm_hap_nested_page_fault+0x16c/0x6d8
(XEN)[] vmx_vmexit_handler+0x19b0/0x1f2e
(XEN)[] vmx_asm_vmexit_handler+0xfa/0x270

It seems to me that guest is trying to mask a msi while the memory decoding
of the device is disabled. Performing a device reset without proper method
to avoid guest's MSI-X operation would lead to this issue.

The fix is basic - detach pci device before resetting the device.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 
---
 tools/libxl/libxl_pci.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..855fb71 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1459,17 +1459,17 @@ skip1:
 fclose(f);
 }
 out:
-/* don't do multiple resets while some functions are still passed through 
*/
-if ( (pcidev->vdevfn & 0x7) == 0 ) {
-libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, 
pcidev->func);
-}
-
 if (!isstubdom) {
 rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
 if (rc < 0 && (hvm || errno != ENOSYS))
 LOGED(ERROR, domainid, "xc_deassign_device failed");
 }
 
+/* don't do multiple resets while some functions are still passed through 
*/
+if ( (pcidev->vdevfn & 0x7) == 0 ) {
+libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, 
pcidev->func);
+}
+
 stubdomid = libxl_get_stubdom_id(ctx, domid);
 if (stubdomid != 0) {
 libxl_device_pci pcidev_s = *pcidev;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 3/3] xen/pt: initialize 'warned' field of arch_msix properly

2019-01-25 Thread Chao Gao

Also clean up current code by moving initialization of arch specific
fields out of common code.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
Changes in v5:
 - rename init_arch_msix to arch_init_msix
 - place arch_init_msix right after the definition of arch_msix

Changes in v4:
 - newly added
---
 xen/drivers/passthrough/pci.c | 2 +-
 xen/include/asm-x86/msi.h | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index a347806..a56929a 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -367,7 +367,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 
bus, u8 devfn)
 xfree(pdev);
 return NULL;
 }
-spin_lock_init(&msix->table_lock);
+arch_init_msix(msix);
 pdev->msix = msix;
 }
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 10387dc..7b13c07 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -242,6 +242,12 @@ struct arch_msix {
 domid_t warned;
 };
 
+static inline void arch_init_msix(struct arch_msix *msix)
+{
+spin_lock_init(&msix->table_lock);
+msix->warned = DOMID_INVALID;
+}
+
 void early_msi_init(void);
 void msi_compose_msg(unsigned vector, const cpumask_t *mask,
  struct msi_msg *msg);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 1/3] xen/pt: fix some pass-thru devices don't work across reboot

2019-01-25 Thread Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
->free_domain_pirqs
->unmap_domain_pirq (if pirq isn't unmapped by qemu)
->pirq_guest_force_unbind
->__pirq_guest_unbind
->mask_msi_irq(=desc->handler->disable())
->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

With this behavior, qemu and pciback are not aware of the forcibly unbindng
and unmapping done by Xen. As a result, the state of pirq maintained by Xen and
pciback/qemu becomes inconsistent. Particularly for hot-plug/hot-unplug case,
guests stay alive; such inconsistency may cause other issues. To resolve
this inconsistency and keep compatibility with current qemu and pciback,
two flags, force_unmapped and force_unbound are used to denote that a pirq is
forcibly unmapped or unbound. The flags are set when Xen unbinds or unmaps the
pirq behind qemu and pciback. And subsequent unbinding or unmapping requests
from qemu/pciback can clear these flags and free the pirq.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao 
---
Changes in v6:
 - introduce flags to denote that a pirq has been forcibly unmapped/unbound.
   It helps to keep compatibility with current qemu/pciback.

Changes in v5:
 - fix the potential infinite loop
 - assert that unmap_domain_pirq() won't fail
 - assert msi_list is empty after the loop in pci_unmap_msi
 - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
   compilation error when building PVShim

Changes in v4:
 - split out change to 'msix->warned' field
 - handle multiple msi cases
 - use list_first_entry_or_null to traverse 'pdev->msi_list'
---
 xen/arch/x86/domctl.c |  6 +++-
 xen/arch/x86/irq.c| 54 ++---
 xen/drivers/passthrough/io.c  | 81 +--
 xen/drivers/passthrough/pci.c | 61 
 xen/include/asm-x86/irq.h |  1 +
 xen/include/xen/iommu.h   |  4 +++
 xen/include/xen/irq.h |  9 -
 7 files changed, 176 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9bf2d08..fb7dadc 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -732,7 +732,11 @@ long arch_do_domctl(
 break;
 
 ret = -EPERM;
-if ( irq <= 0 || !irq_access_permitted(currd, irq) )
+/*
+ * irq < 0 denotes the corresponding pirq has been forcibly unbound.
+ * For this case, bypass permission check to reap the pirq.
+ */
+if ( !irq || ((irq > 0) && !irq_access_permitted(currd, irq)) )
 break;
 
 ret = xsm_unbind_pt_irq(XSM_HOOK, d, bind);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 23b4f42..fa533e1 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1345,10 +1345,8 @@ void (pirq_cleanup_check)(struct pirq *pirq, struct 
domain *d)
 /*
  * Check whether all fields have their default values, and delete
  * the entry from the tree if so.
- *
- * NB: Common parts were already checked.
  */
-if ( pirq->arch.irq )
+if ( pirq->force_unmapped || pirq->force_unbound || pirq->arch.irq )
 return;
 
 if ( is_hvm_domain(d) )
@@ -1582,6 +1580,13 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 WARN_ON(!spin_is_locked(&v->domain->event_lock));
 BUG_ON(!local_irq_is_enabled());
 
+if ( pirq->force_unmapped || pirq->force_unbound )
+{
+dprintk(XENLOG_G_ERR, "dom%d: forcibly unmapped/unbound pirq %d can't 
be bound\n",
+v->domain->domain_id, pirq->pirq);
+return -EINVAL;
+}
+
  retry:
 desc = p

1 2 3 4 >

1 - 100 of 353 matches

Mail list logo