Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common

2020-12-02 Thread Jan Beulich
On 01.12.2020 19:53, Oleksandr wrote:
> On 01.12.20 13:03, Alex Bennée wrote:
>> Oleksandr Tyshchenko  writes:
>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain 
>>> *d, ioservid_t id,
>>>   if ( s->emulator != current->domain )
>>>   goto out;
>>>   
>>> -rc = p2m_set_ioreq_server(d, flags, s);
>>> +rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>   
>>>out:
>>>   spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>   
>>> -if ( rc == 0 && flags == 0 )
>>> -{
>>> -struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> -
>>> -if ( read_atomic(&p2m->ioreq.entry_count) )
>>> -p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> -}
>>> -
>> It should be noted that p2m holds it's own lock but I'm unfamiliar with
>> Xen's locking architecture. Is there anything that prevents another vCPU
>> accessing a page that is also being used my ioreq on the first vCPU?
> I am not sure that I would be able to provide reasonable explanations here.
> All what I understand is that p2m_change_entry_type_global() x86 
> specific (we don't have p2m_ioreq_server concept on Arm) and should 
> remain as such (not exposed to the common code).
> IIRC, I raised a question during V2 review whether we could have ioreq 
> server lock around the call to p2m_change_entry_type_global() and didn't 
> get objections.

Not getting objections doesn't mean much. Personally I don't recall
such a question, but this doesn't mean much. The important thing
here is that you properly justify this change in the description (I
didn't look at this version of the patch as a whole yet, so quite
likely you actually do). This is because you need to guarantee that
you don't introduce any lock order violations by this. There also
should be an attempt to avoid future introduction of issues, by
adding lock nesting related comments in suitable places. Again,
quite likely you actually do so, and I will notice it once looking
at the patch as a whole.

All of this said, I think it should be tried hard to avoid
introducing this extra lock nesting, if there aren't other places
already where the same nesting of locks is in effect.

> I may mistake, but looks like the lock being used
> in p2m_change_entry_type_global() is yet another lock for protecting 
> page table operations, so unlikely we could get into the trouble calling 
> this function with the ioreq server lock held.

I'm afraid I don't understand the "yet another" here: The ioreq
server lock clearly serves an entirely different purpose.

Jan



Re: [PATCH] vpci/msix: exit early if MSI-X is disabled

2020-12-02 Thread Jan Beulich
On 01.12.2020 18:40, Roger Pau Monne wrote:
> Do not attempt to mask an MSI-X entry if MSI-X is not enabled. Else it
> will lead to hitting the following assert on debug builds:
> 
> (XEN) Panic on CPU 13:
> (XEN) Assertion 'entry->arch.pirq != INVALID_PIRQ' failed at vmsi.c:843

Since the line number is only of limited use, I'd like to see the
function name (vpci_msix_arch_mask_entry()) also added here; easily
done while committing, if the question further down can be resolved
without code change.

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long 
> addr, unsigned int len,
>   * so that it picks the new state.
>   */
>  entry->masked = new_masked;
> -if ( !new_masked && msix->enabled && !msix->masked && entry->updated 
> )
> +
> +if ( !msix->enabled )
> +break;
> +
> +if ( !new_masked && !msix->masked && entry->updated )
>  {
>  /*
>   * If MSI-X is enabled, the function mask is not active, the 
> entry

What about a "disabled" -> "enabled-but-masked" transition? This,
afaict, similarly won't trigger setting up of entries from
control_write(), and hence I'd expect the ASSERT() to similarly
trigger when subsequently an entry's mask bit gets altered.

I'd also be fine making this further adjustment, if you agree,
but the one thing I haven't been able to fully convince myself of
is that there's then still no need to set ->updated to true.

Jan



Re: [PATCH v2 06/17] xen/cpupool: use ERR_PTR() for returning error cause from cpupool_create()

2020-12-02 Thread Jan Beulich
On 01.12.2020 09:21, Juergen Gross wrote:
> Instead of a pointer to an error variable as parameter just use
> ERR_PTR() to return the cause of an error in cpupool_create().
> 
> This propagates to scheduler_alloc(), too.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 
with one small question:

> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -3233,26 +3233,25 @@ struct scheduler *scheduler_get_default(void)
>  return &ops;
>  }
>  
> -struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
> +struct scheduler *scheduler_alloc(unsigned int sched_id)
>  {
>  int i;
> +int ret;

I guess you didn't merge this with i's declaration because of a
plan/hope for i to be converted to unsigned int?

Jan



Re: [PATCH] Fix spelling errors.

2020-12-02 Thread Jan Beulich
On 30.11.2020 18:39, Diederik de Haas wrote:
> Only spelling errors; no functional changes.
> 
> In docs/misc/dump-core-format.txt there are a few more instances of
> 'informations'. I'll leave that up to someone who can properly determine
> how those sentences should be constructed.
> 
> Signed-off-by: Diederik de Haas 
> 
> Please CC me in replies as I'm not subscribed to this list.
> ---
>  docs/man/xl.1.pod.in   | 2 +-
>  docs/man/xl.cfg.5.pod.in   | 2 +-
>  docs/man/xlcpupool.cfg.5.pod   | 2 +-
>  tools/firmware/rombios/rombios.c   | 2 +-
>  tools/libs/light/libxl_stream_read.c   | 2 +-
>  tools/xl/xl_cmdtable.c | 2 +-

Since these are trivial and obvious adjustments, I intend to not wait
very long for an xl/libxl side ack, before deciding to commit this.
Perhaps just another day or so.

Jan



[PATCH v4 01/11] viridian: don't blindly write to 32-bit registers is 'mode' is invalid

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

If hvm_guest_x86_mode() returns something other than 8 or 4 then
viridian_hypercall() will return immediately but, on the way out, will write
back status as if 'mode' was 4. This patch simply makes it leave the registers
alone.

NOTE: The formatting of the 'out' label and the switch statement are also
  adjusted as per CODING_STYLE.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v4:
 - Fixed another CODING_STYLE violation.

v2:
 - New in v2
---
 xen/arch/x86/hvm/viridian/viridian.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index dc7183a54627..338e705bd29c 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -692,13 +692,15 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 break;
 }
 
-out:
+ out:
 output.result = status;
-switch (mode) {
+switch (mode)
+{
 case 8:
 regs->rax = output.raw;
 break;
-default:
+
+case 4:
 regs->rdx = output.raw >> 32;
 regs->rax = (uint32_t)output.raw;
 break;
-- 
2.20.1




[PATCH v4 00/11] viridian: add support for ExProcessorMasks

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

Paul Durrant (11):
  viridian: don't blindly write to 32-bit registers is 'mode' is invalid
  viridian: move flush hypercall implementation into separate function
  viridian: move IPI hypercall implementation into separate function
  viridian: introduce a per-cpu hypercall_vpmask and accessor
functions...
  viridian: use hypercall_vpmask in hvcall_ipi()
  viridian: use softirq batching in hvcall_ipi()
  viridian: add ExProcessorMasks variants of the flush hypercalls
  viridian: add ExProcessorMasks variant of the IPI hypercall
  viridian: log initial invocation of each type of hypercall
  viridian: add a new '_HVMPV_ex_processor_masks' bit into
HVM_PARAM_VIRIDIAN...
  xl / libxl: add 'ex_processor_mask' into
'libxl_viridian_enlightenment'

 docs/man/xl.cfg.5.pod.in |   8 +
 tools/include/libxl.h|   7 +
 tools/libs/light/libxl_types.idl |   1 +
 tools/libs/light/libxl_x86.c |   3 +
 xen/arch/x86/hvm/viridian/viridian.c | 604 +--
 xen/include/asm-x86/hvm/viridian.h   |  10 +
 xen/include/public/hvm/params.h  |   7 +-
 7 files changed, 516 insertions(+), 124 deletions(-)

-- 
2.20.1




[PATCH v4 04/11] viridian: introduce a per-cpu hypercall_vpmask and accessor functions...

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

... and make use of them in hvcall_flush()/need_flush().

Subsequent patches will need to deal with virtual processor masks potentially
wider than 64 bits. Thus, to avoid using too much stack, this patch
introduces global per-cpu virtual processor masks and converts the
implementation of hvcall_flush() to use them.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v2:
 - Modified vpmask_set() to take a base 'vp' and a 64-bit 'mask', still
   looping over the mask as bitmap.h does not provide a primitive for copying
   one mask into another at an offset
 - Added ASSERTions to verify that we don't attempt to set or test bits
   beyond the limit of the map
---
 xen/arch/x86/hvm/viridian/viridian.c | 58 ++--
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 9844bb57872a..cfd87504052f 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -507,15 +507,59 @@ void viridian_domain_deinit(struct domain *d)
 XFREE(d->arch.hvm.viridian);
 }
 
+struct hypercall_vpmask {
+DECLARE_BITMAP(mask, HVM_MAX_VCPUS);
+};
+
+static DEFINE_PER_CPU(struct hypercall_vpmask, hypercall_vpmask);
+
+static void vpmask_empty(struct hypercall_vpmask *vpmask)
+{
+bitmap_zero(vpmask->mask, HVM_MAX_VCPUS);
+}
+
+static void vpmask_set(struct hypercall_vpmask *vpmask, unsigned int vp,
+   uint64_t mask)
+{
+unsigned int count = sizeof(mask) * 8;
+
+while ( count-- )
+{
+if ( !mask )
+break;
+
+if ( mask & 1 )
+{
+ASSERT(vp < HVM_MAX_VCPUS);
+__set_bit(vp, vpmask->mask);
+}
+
+mask >>= 1;
+vp++;
+}
+}
+
+static void vpmask_fill(struct hypercall_vpmask *vpmask)
+{
+bitmap_fill(vpmask->mask, HVM_MAX_VCPUS);
+}
+
+static bool vpmask_test(const struct hypercall_vpmask *vpmask,
+unsigned int vp)
+{
+ASSERT(vp < HVM_MAX_VCPUS);
+return test_bit(vp, vpmask->mask);
+}
+
 /*
  * Windows should not issue the hypercalls requiring this callback in the
  * case where vcpu_id would exceed the size of the mask.
  */
 static bool need_flush(void *ctxt, struct vcpu *v)
 {
-uint64_t vcpu_mask = *(uint64_t *)ctxt;
+struct hypercall_vpmask *vpmask = ctxt;
 
-return vcpu_mask & (1ul << v->vcpu_id);
+return vpmask_test(vpmask, v->vcpu_id);
 }
 
 union hypercall_input {
@@ -546,6 +590,7 @@ static int hvcall_flush(const union hypercall_input *input,
 paddr_t input_params_gpa,
 paddr_t output_params_gpa)
 {
+struct hypercall_vpmask *vpmask = &this_cpu(hypercall_vpmask);
 struct {
 uint64_t address_space;
 uint64_t flags;
@@ -567,13 +612,18 @@ static int hvcall_flush(const union hypercall_input 
*input,
  * so err on the safe side.
  */
 if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS )
-input_params.vcpu_mask = ~0ul;
+vpmask_fill(vpmask);
+else
+{
+vpmask_empty(vpmask);
+vpmask_set(vpmask, 0, input_params.vcpu_mask);
+}
 
 /*
  * A false return means that another vcpu is currently trying
  * a similar operation, so back off.
  */
-if ( !paging_flush_tlb(need_flush, &input_params.vcpu_mask) )
+if ( !paging_flush_tlb(need_flush, vpmask) )
 return -ERESTART;
 
 output->rep_complete = input->rep_count;
-- 
2.20.1




[PATCH v4 05/11] viridian: use hypercall_vpmask in hvcall_ipi()

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

A subsequent patch will need to IPI a mask of virtual processors potentially
wider than 64 bits. A previous patch introduced per-cpu hypercall_vpmask
to allow hvcall_flush() to deal with such wide masks. This patch modifies
the implementation of hvcall_ipi() to make use of the same mask structures,
introducing a for_each_vp() macro to facilitate traversing a mask.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v3:
 - Couple of extra 'const' qualifiers

v2:
 - Drop the 'vp' loop now that vpmask_set() will do it internally
---
 xen/arch/x86/hvm/viridian/viridian.c | 44 +---
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index cfd87504052f..fb38210e2cc7 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -551,6 +551,26 @@ static bool vpmask_test(const struct hypercall_vpmask 
*vpmask,
 return test_bit(vp, vpmask->mask);
 }
 
+static unsigned int vpmask_first(const struct hypercall_vpmask *vpmask)
+{
+return find_first_bit(vpmask->mask, HVM_MAX_VCPUS);
+}
+
+static unsigned int vpmask_next(const struct hypercall_vpmask *vpmask,
+unsigned int vp)
+{
+/*
+ * If vp + 1 > HVM_MAX_VCPUS then find_next_bit() will return
+ * HVM_MAX_VCPUS, ensuring the for_each_vp ( ... ) loop terminates.
+ */
+return find_next_bit(vpmask->mask, HVM_MAX_VCPUS, vp + 1);
+}
+
+#define for_each_vp(vpmask, vp) \
+   for ( (vp) = vpmask_first(vpmask); \
+ (vp) < HVM_MAX_VCPUS; \
+ (vp) = vpmask_next(vpmask, vp) )
+
 /*
  * Windows should not issue the hypercalls requiring this callback in the
  * case where vcpu_id would exceed the size of the mask.
@@ -631,13 +651,21 @@ static int hvcall_flush(const union hypercall_input 
*input,
 return 0;
 }
 
+static void send_ipi(struct hypercall_vpmask *vpmask, uint8_t vector)
+{
+struct domain *currd = current->domain;
+unsigned int vp;
+
+for_each_vp ( vpmask, vp )
+vlapic_set_irq(vcpu_vlapic(currd->vcpu[vp]), vector, 0);
+}
+
 static int hvcall_ipi(const union hypercall_input *input,
   union hypercall_output *output,
   paddr_t input_params_gpa,
   paddr_t output_params_gpa)
 {
-struct domain *currd = current->domain;
-struct vcpu *v;
+struct hypercall_vpmask *vpmask = &this_cpu(hypercall_vpmask);
 uint32_t vector;
 uint64_t vcpu_mask;
 
@@ -676,16 +704,10 @@ static int hvcall_ipi(const union hypercall_input *input,
 if ( vector < 0x10 || vector > 0xff )
 return -EINVAL;
 
-for_each_vcpu ( currd, v )
-{
-if ( v->vcpu_id >= (sizeof(vcpu_mask) * 8) )
-return -EINVAL;
+vpmask_empty(vpmask);
+vpmask_set(vpmask, 0, vcpu_mask);
 
-if ( !(vcpu_mask & (1ul << v->vcpu_id)) )
-continue;
-
-vlapic_set_irq(vcpu_vlapic(v), vector, 0);
-}
+send_ipi(vpmask, vector);
 
 return 0;
 }
-- 
2.20.1




[PATCH v4 03/11] viridian: move IPI hypercall implementation into separate function

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

This patch moves the implementation of HVCALL_SEND_IPI that is currently
inline in viridian_hypercall() into a new hvcall_ipi() function.

The new function returns Xen errno values similarly to hvcall_flush(). Hence
the errno translation code in viridial_hypercall() is generalized, removing
the need for the local 'status' variable.

NOTE: The formatting of the switch statement at the top of
  viridial_hypercall() is also adjusted as per CODING_STYLE.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v3:
 - Adjust prototype of new function

v2:
 - Different formatting adjustment
---
 xen/arch/x86/hvm/viridian/viridian.c | 168 ++-
 1 file changed, 87 insertions(+), 81 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 69b6f285e8aa..9844bb57872a 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -581,13 +581,72 @@ static int hvcall_flush(const union hypercall_input 
*input,
 return 0;
 }
 
+static int hvcall_ipi(const union hypercall_input *input,
+  union hypercall_output *output,
+  paddr_t input_params_gpa,
+  paddr_t output_params_gpa)
+{
+struct domain *currd = current->domain;
+struct vcpu *v;
+uint32_t vector;
+uint64_t vcpu_mask;
+
+/* Get input parameters. */
+if ( input->fast )
+{
+if ( input_params_gpa >> 32 )
+return -EINVAL;
+
+vector = input_params_gpa;
+vcpu_mask = output_params_gpa;
+}
+else
+{
+struct {
+uint32_t vector;
+uint8_t target_vtl;
+uint8_t reserved_zero[3];
+uint64_t vcpu_mask;
+} input_params;
+
+if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
+  sizeof(input_params)) != HVMTRANS_okay )
+return -EINVAL;
+
+if ( input_params.target_vtl ||
+ input_params.reserved_zero[0] ||
+ input_params.reserved_zero[1] ||
+ input_params.reserved_zero[2] )
+return -EINVAL;
+
+vector = input_params.vector;
+vcpu_mask = input_params.vcpu_mask;
+}
+
+if ( vector < 0x10 || vector > 0xff )
+return -EINVAL;
+
+for_each_vcpu ( currd, v )
+{
+if ( v->vcpu_id >= (sizeof(vcpu_mask) * 8) )
+return -EINVAL;
+
+if ( !(vcpu_mask & (1ul << v->vcpu_id)) )
+continue;
+
+vlapic_set_irq(vcpu_vlapic(v), vector, 0);
+}
+
+return 0;
+}
+
 int viridian_hypercall(struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 struct domain *currd = curr->domain;
 int mode = hvm_guest_x86_mode(curr);
 unsigned long input_params_gpa, output_params_gpa;
-uint16_t status = HV_STATUS_SUCCESS;
+int rc = 0;
 union hypercall_input input;
 union hypercall_output output = {};
 
@@ -600,11 +659,13 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 input_params_gpa = regs->rdx;
 output_params_gpa = regs->r8;
 break;
+
 case 4:
 input.raw = (regs->rdx << 32) | regs->eax;
 input_params_gpa = (regs->rbx << 32) | regs->ecx;
 output_params_gpa = (regs->rdi << 32) | regs->esi;
 break;
+
 default:
 goto out;
 }
@@ -616,92 +677,18 @@ int viridian_hypercall(struct cpu_user_regs *regs)
  * See section 14.5.1 of the specification.
  */
 do_sched_op(SCHEDOP_yield, guest_handle_from_ptr(NULL, void));
-status = HV_STATUS_SUCCESS;
 break;
 
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
-{
-int rc = hvcall_flush(&input, &output, input_params_gpa,
-  output_params_gpa);
-
-switch ( rc )
-{
-case 0:
-break;
-
-case -ERESTART:
-return HVM_HCALL_preempted;
-
-default:
-ASSERT_UNREACHABLE();
-/* Fallthrough */
-case -EINVAL:
-status = HV_STATUS_INVALID_PARAMETER;
-break;
-}
-
+rc = hvcall_flush(&input, &output, input_params_gpa,
+  output_params_gpa);
 break;
-}
 
 case HVCALL_SEND_IPI:
-{
-struct vcpu *v;
-uint32_t vector;
-uint64_t vcpu_mask;
-
-status = HV_STATUS_INVALID_PARAMETER;
-
-/* Get input parameters. */
-if ( input.fast )
-{
-if ( input_params_gpa >> 32 )
-break;
-
-vector = input_params_gpa;
-vcpu_mask = output_params_gpa;
-}
-else
-{
-struct {
-uint32_t vector;
-uint8_t target_vtl;
-uint8_t reserved_zero[3];
-  

[PATCH v4 02/11] viridian: move flush hypercall implementation into separate function

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

This patch moves the implementation of HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST
that is currently inline in viridian_hypercall() into a new hvcall_flush()
function.

The new function returns Xen erro values which are then dealt with
appropriately. A return value of -ERESTART translates to viridian_hypercall()
returning HVM_HCALL_preempted. Other return values translate to status codes
and viridian_hypercall() returning HVM_HCALL_completed. Currently the only
values, other than -ERESTART, returned by hvcall_flush() are 0 (indicating
success) or -EINVAL.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v3:
 - Adjust prototype of new function
---
 xen/arch/x86/hvm/viridian/viridian.c | 130 ---
 1 file changed, 78 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 338e705bd29c..69b6f285e8aa 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -518,6 +518,69 @@ static bool need_flush(void *ctxt, struct vcpu *v)
 return vcpu_mask & (1ul << v->vcpu_id);
 }
 
+union hypercall_input {
+uint64_t raw;
+struct {
+uint16_t call_code;
+uint16_t fast:1;
+uint16_t rsvd1:15;
+uint16_t rep_count:12;
+uint16_t rsvd2:4;
+uint16_t rep_start:12;
+uint16_t rsvd3:4;
+};
+};
+
+union hypercall_output {
+uint64_t raw;
+struct {
+uint16_t result;
+uint16_t rsvd1;
+uint32_t rep_complete:12;
+uint32_t rsvd2:20;
+};
+};
+
+static int hvcall_flush(const union hypercall_input *input,
+union hypercall_output *output,
+paddr_t input_params_gpa,
+paddr_t output_params_gpa)
+{
+struct {
+uint64_t address_space;
+uint64_t flags;
+uint64_t vcpu_mask;
+} input_params;
+
+/* These hypercalls should never use the fast-call convention. */
+if ( input->fast )
+return -EINVAL;
+
+/* Get input parameters. */
+if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
+  sizeof(input_params)) != HVMTRANS_okay )
+return -EINVAL;
+
+/*
+ * It is not clear from the spec. if we are supposed to
+ * include current virtual CPU in the set or not in this case,
+ * so err on the safe side.
+ */
+if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS )
+input_params.vcpu_mask = ~0ul;
+
+/*
+ * A false return means that another vcpu is currently trying
+ * a similar operation, so back off.
+ */
+if ( !paging_flush_tlb(need_flush, &input_params.vcpu_mask) )
+return -ERESTART;
+
+output->rep_complete = input->rep_count;
+
+return 0;
+}
+
 int viridian_hypercall(struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
@@ -525,29 +588,8 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 int mode = hvm_guest_x86_mode(curr);
 unsigned long input_params_gpa, output_params_gpa;
 uint16_t status = HV_STATUS_SUCCESS;
-
-union hypercall_input {
-uint64_t raw;
-struct {
-uint16_t call_code;
-uint16_t fast:1;
-uint16_t rsvd1:15;
-uint16_t rep_count:12;
-uint16_t rsvd2:4;
-uint16_t rep_start:12;
-uint16_t rsvd3:4;
-};
-} input;
-
-union hypercall_output {
-uint64_t raw;
-struct {
-uint16_t result;
-uint16_t rsvd1;
-uint32_t rep_complete:12;
-uint32_t rsvd2:20;
-};
-} output = { 0 };
+union hypercall_input input;
+union hypercall_output output = {};
 
 ASSERT(is_viridian_domain(currd));
 
@@ -580,41 +622,25 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
 {
-struct {
-uint64_t address_space;
-uint64_t flags;
-uint64_t vcpu_mask;
-} input_params;
+int rc = hvcall_flush(&input, &output, input_params_gpa,
+  output_params_gpa);
 
-/* These hypercalls should never use the fast-call convention. */
-status = HV_STATUS_INVALID_PARAMETER;
-if ( input.fast )
+switch ( rc )
+{
+case 0:
 break;
 
-/* Get input parameters. */
-if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
-  sizeof(input_params)) !=
- HVMTRANS_okay )
-break;
-
-/*
- * It is not clear from the spec. if we are supposed to
- * include current virtual CPU in the set or not in this case,
- * so err on the safe side.
- */
-if ( input_params.flags & HV_FLUS

[PATCH v4 06/11] viridian: use softirq batching in hvcall_ipi()

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

vlapic_ipi() uses a softirq batching mechanism to improve the efficiency of
sending a IPIs to large number of processors. This patch modifies send_ipi()
(the worker function called by hvcall_ipi()) to also make use of the
mechanism when there multiple bits set the hypercall_vpmask.

Signed-off-by: Paul Durrant 
Reviewed-by: Jan Beulich 
---
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v2:
 - Don't add the 'nr' field to struct hypercall_vpmask and use
   bitmap_weight() instead
---
 xen/arch/x86/hvm/viridian/viridian.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index fb38210e2cc7..47f15717bcd3 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -571,6 +572,11 @@ static unsigned int vpmask_next(const struct 
hypercall_vpmask *vpmask,
  (vp) < HVM_MAX_VCPUS; \
  (vp) = vpmask_next(vpmask, vp) )
 
+static unsigned int vpmask_nr(const struct hypercall_vpmask *vpmask)
+{
+return bitmap_weight(vpmask->mask, HVM_MAX_VCPUS);
+}
+
 /*
  * Windows should not issue the hypercalls requiring this callback in the
  * case where vcpu_id would exceed the size of the mask.
@@ -654,10 +660,17 @@ static int hvcall_flush(const union hypercall_input 
*input,
 static void send_ipi(struct hypercall_vpmask *vpmask, uint8_t vector)
 {
 struct domain *currd = current->domain;
+unsigned int nr = vpmask_nr(vpmask);
 unsigned int vp;
 
+if ( nr > 1 )
+cpu_raise_softirq_batch_begin();
+
 for_each_vp ( vpmask, vp )
 vlapic_set_irq(vcpu_vlapic(currd->vcpu[vp]), vector, 0);
+
+if ( nr > 1 )
+cpu_raise_softirq_batch_finish();
 }
 
 static int hvcall_ipi(const union hypercall_input *input,
-- 
2.20.1




[PATCH v4 08/11] viridian: add ExProcessorMasks variant of the IPI hypercall

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

A previous patch introduced variants of the flush hypercalls that take a
'Virtual Processor Set' as an argument rather than a simple 64-bit mask.
This patch introduces a similar variant of the HVCALL_SEND_IPI hypercall
(HVCALL_SEND_IPI_EX).

NOTE: As with HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST_EX, a guest should
  not yet issue the HVCALL_SEND_IPI_EX hypercall as support for
  'ExProcessorMasks' is not yet advertised via CPUID.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v3:
 - Adjust prototype of new function

v2:
 - Sanity check size before hvm_copy_from_guest_phys()
---
 xen/arch/x86/hvm/viridian/viridian.c | 74 
 1 file changed, 74 insertions(+)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index fceca760b41d..9aa8e6c2572c 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -860,6 +860,75 @@ static int hvcall_ipi(const union hypercall_input *input,
 return 0;
 }
 
+static int hvcall_ipi_ex(const union hypercall_input *input,
+ union hypercall_output *output,
+ paddr_t input_params_gpa,
+ paddr_t output_params_gpa)
+{
+struct hypercall_vpmask *vpmask = &this_cpu(hypercall_vpmask);
+struct {
+uint32_t vector;
+uint8_t target_vtl;
+uint8_t reserved_zero[3];
+struct hv_vpset set;
+} input_params;
+union hypercall_vpset *vpset = &this_cpu(hypercall_vpset);
+struct hv_vpset *set = &vpset->set;
+size_t size;
+int rc;
+
+/* These hypercalls should never use the fast-call convention. */
+if ( input->fast )
+return -EINVAL;
+
+/* Get input parameters. */
+if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
+  sizeof(input_params)) != HVMTRANS_okay )
+return -EINVAL;
+
+if ( input_params.target_vtl ||
+ input_params.reserved_zero[0] ||
+ input_params.reserved_zero[1] ||
+ input_params.reserved_zero[2] )
+return HV_STATUS_INVALID_PARAMETER;
+
+if ( input_params.vector < 0x10 || input_params.vector > 0xff )
+return HV_STATUS_INVALID_PARAMETER;
+
+*set = input_params.set;
+if ( set->format == HV_GENERIC_SET_SPARSE_4K )
+{
+unsigned long offset = offsetof(typeof(input_params),
+set.bank_contents);
+
+size = sizeof(*set->bank_contents) * hv_vpset_nr_banks(set);
+
+if ( offsetof(typeof(*vpset), set.bank_contents[0]) + size >
+ sizeof(*vpset) )
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+
+if ( hvm_copy_from_guest_phys(&set->bank_contents,
+  input_params_gpa + offset,
+  size) != HVMTRANS_okay)
+return -EINVAL;
+
+size += sizeof(*set);
+}
+else
+size = sizeof(*set);
+
+rc = hv_vpset_to_vpmask(set, vpmask);
+if ( rc )
+return rc;
+
+send_ipi(vpmask, input_params.vector);
+
+return 0;
+}
+
 int viridian_hypercall(struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
@@ -916,6 +985,11 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 output_params_gpa);
 break;
 
+case HVCALL_SEND_IPI_EX:
+rc = hvcall_ipi_ex(&input, &output, input_params_gpa,
+   output_params_gpa);
+break;
+
 default:
 gprintk(XENLOG_WARNING, "unimplemented hypercall %04x\n",
 input.call_code);
-- 
2.20.1




[PATCH v4 09/11] viridian: log initial invocation of each type of hypercall

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

To make is simpler to observe which viridian hypercalls are issued by a
particular Windows guest, this patch adds a per-domain mask to track them.
Each type of hypercall causes a different bit to be set in the mask and
when the bit transitions from clear to set, a log line is emitted showing
the name of the hypercall and the domain that issued it.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v2:
 - Use DECLARE_BITMAP() for 'hypercall_flags'
 - Use an enum for _HCALL_* values
---
 xen/arch/x86/hvm/viridian/viridian.c | 21 +
 xen/include/asm-x86/hvm/viridian.h   | 10 ++
 2 files changed, 31 insertions(+)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 9aa8e6c2572c..95314c1f7f6c 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -933,6 +933,7 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 struct domain *currd = curr->domain;
+struct viridian_domain *vd = currd->arch.hvm.viridian;
 int mode = hvm_guest_x86_mode(curr);
 unsigned long input_params_gpa, output_params_gpa;
 int rc = 0;
@@ -962,6 +963,10 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 switch ( input.call_code )
 {
 case HVCALL_NOTIFY_LONG_SPIN_WAIT:
+if ( !test_and_set_bit(_HCALL_spin_wait, vd->hypercall_flags) )
+printk(XENLOG_G_INFO "d%d: VIRIDIAN 
HVCALL_NOTIFY_LONG_SPIN_WAIT\n",
+   currd->domain_id);
+
 /*
  * See section 14.5.1 of the specification.
  */
@@ -970,22 +975,38 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
+if ( !test_and_set_bit(_HCALL_flush, vd->hypercall_flags) )
+printk(XENLOG_G_INFO "%pd: VIRIDIAN 
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST\n",
+   currd);
+
 rc = hvcall_flush(&input, &output, input_params_gpa,
   output_params_gpa);
 break;
 
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
 case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
+if ( !test_and_set_bit(_HCALL_flush_ex, vd->hypercall_flags) )
+printk(XENLOG_G_INFO "%pd: VIRIDIAN 
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST_EX\n",
+   currd);
+
 rc = hvcall_flush_ex(&input, &output, input_params_gpa,
  output_params_gpa);
 break;
 
 case HVCALL_SEND_IPI:
+if ( !test_and_set_bit(_HCALL_ipi, vd->hypercall_flags) )
+printk(XENLOG_G_INFO "%pd: VIRIDIAN HVCALL_SEND_IPI\n",
+   currd);
+
 rc = hvcall_ipi(&input, &output, input_params_gpa,
 output_params_gpa);
 break;
 
 case HVCALL_SEND_IPI_EX:
+if ( !test_and_set_bit(_HCALL_ipi_ex, vd->hypercall_flags) )
+printk(XENLOG_G_INFO "%pd: VIRIDIAN HVCALL_SEND_IPI_EX\n",
+   currd);
+
 rc = hvcall_ipi_ex(&input, &output, input_params_gpa,
output_params_gpa);
 break;
diff --git a/xen/include/asm-x86/hvm/viridian.h 
b/xen/include/asm-x86/hvm/viridian.h
index cbf77d9c760b..4c8ff6e80b6f 100644
--- a/xen/include/asm-x86/hvm/viridian.h
+++ b/xen/include/asm-x86/hvm/viridian.h
@@ -55,10 +55,20 @@ struct viridian_time_ref_count
 int64_t off;
 };
 
+enum {
+_HCALL_spin_wait,
+_HCALL_flush,
+_HCALL_flush_ex,
+_HCALL_ipi,
+_HCALL_ipi_ex,
+_HCALL_nr /* must be last */
+};
+
 struct viridian_domain
 {
 union hv_guest_os_id guest_os_id;
 union hv_vp_assist_page_msr hypercall_gpa;
+DECLARE_BITMAP(hypercall_flags, _HCALL_nr);
 struct viridian_time_ref_count time_ref_count;
 struct viridian_page reference_tsc;
 };
-- 
2.20.1




[PATCH v4 07/11] viridian: add ExProcessorMasks variants of the flush hypercalls

2020-12-02 Thread Paul Durrant
From: Paul Durrant 

The Microsoft Hypervisor TLFS specifies variants of the already implemented
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST hypercalls that take a 'Virtual
Processor Set' as an argument rather than a simple 64-bit mask.

This patch adds a new hvcall_flush_ex() function to implement these
(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST_EX) hypercalls. This makes use of
new helper functions, hv_vpset_nr_banks() and hv_vpset_to_vpmask(), to
determine the size of the Virtual Processor Set (so it can be copied from
guest memory) and parse it into hypercall_vpmask (respectively).

NOTE: A guest should not yet issue these hypercalls as 'ExProcessorMasks'
  support needs to be advertised via CPUID. This will be done in a
  subsequent patch.

Signed-off-by: Paul Durrant 
---
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: "Roger Pau Monné" 

v3:
 - Adjust one of the helper macros
 - A few more consts and type tweaks
 - Adjust prototype of new function

v2:
 - Add helper macros to define mask and struct sizes
 - Use a union to determine the size of 'hypercall_vpset'
 - Use hweight64() in hv_vpset_nr_banks()
 - Sanity check size before hvm_copy_from_guest_phys()
---
 xen/arch/x86/hvm/viridian/viridian.c | 141 +++
 1 file changed, 141 insertions(+)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 47f15717bcd3..fceca760b41d 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -577,6 +577,69 @@ static unsigned int vpmask_nr(const struct 
hypercall_vpmask *vpmask)
 return bitmap_weight(vpmask->mask, HVM_MAX_VCPUS);
 }
 
+#define HV_VPSET_BANK_SIZE \
+sizeof_field(struct hv_vpset, bank_contents[0])
+
+#define HV_VPSET_SIZE(banks)   \
+(offsetof(struct hv_vpset, bank_contents) + \
+ ((banks) * HV_VPSET_BANK_SIZE))
+
+#define HV_VPSET_MAX_BANKS \
+(sizeof_field(struct hv_vpset, valid_bank_mask) * 8)
+
+union hypercall_vpset {
+struct hv_vpset set;
+uint8_t pad[HV_VPSET_SIZE(HV_VPSET_MAX_BANKS)];
+};
+
+static DEFINE_PER_CPU(union hypercall_vpset, hypercall_vpset);
+
+static unsigned int hv_vpset_nr_banks(struct hv_vpset *vpset)
+{
+return hweight64(vpset->valid_bank_mask);
+}
+
+static uint16_t hv_vpset_to_vpmask(const struct hv_vpset *set,
+   struct hypercall_vpmask *vpmask)
+{
+#define NR_VPS_PER_BANK (HV_VPSET_BANK_SIZE * 8)
+
+switch ( set->format )
+{
+case HV_GENERIC_SET_ALL:
+vpmask_fill(vpmask);
+return 0;
+
+case HV_GENERIC_SET_SPARSE_4K:
+{
+uint64_t bank_mask;
+unsigned int vp, bank = 0;
+
+vpmask_empty(vpmask);
+for ( vp = 0, bank_mask = set->valid_bank_mask;
+  bank_mask;
+  vp += NR_VPS_PER_BANK, bank_mask >>= 1 )
+{
+if ( bank_mask & 1 )
+{
+uint64_t mask = set->bank_contents[bank];
+
+vpmask_set(vpmask, vp, mask);
+bank++;
+}
+}
+return 0;
+}
+
+default:
+break;
+}
+
+return -EINVAL;
+
+#undef NR_VPS_PER_BANK
+}
+
 /*
  * Windows should not issue the hypercalls requiring this callback in the
  * case where vcpu_id would exceed the size of the mask.
@@ -657,6 +720,78 @@ static int hvcall_flush(const union hypercall_input *input,
 return 0;
 }
 
+static int hvcall_flush_ex(const union hypercall_input *input,
+   union hypercall_output *output,
+   paddr_t input_params_gpa,
+   paddr_t output_params_gpa)
+{
+struct hypercall_vpmask *vpmask = &this_cpu(hypercall_vpmask);
+struct {
+uint64_t address_space;
+uint64_t flags;
+struct hv_vpset set;
+} input_params;
+
+/* These hypercalls should never use the fast-call convention. */
+if ( input->fast )
+return -EINVAL;
+
+/* Get input parameters. */
+if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
+  sizeof(input_params)) != HVMTRANS_okay )
+return -EINVAL;
+
+if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS )
+vpmask_fill(vpmask);
+else
+{
+union hypercall_vpset *vpset = &this_cpu(hypercall_vpset);
+struct hv_vpset *set = &vpset->set;
+size_t size;
+int rc;
+
+*set = input_params.set;
+if ( set->format == HV_GENERIC_SET_SPARSE_4K )
+{
+unsigned long offset = offsetof(typeof(input_params),
+set.bank_contents);
+
+size = sizeof(*set->bank_contents) * hv_vpset_nr_banks(set);
+
+if ( offsetof(typeof(*vpset), set.bank_contents[0]) + size >
+ sizeof(*vpset) )
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+
+if ( hvm_copy_from_guest_p

Re: [PATCH] x86/IRQ: bump max number of guests for a shared IRQ to 31

2020-12-02 Thread Jan Beulich
On 01.12.2020 00:59, Igor Druzhinin wrote:
> Current limit of 7 is too restrictive for modern systems where one GSI
> could be shared by potentially many PCI INTx sources where each of them
> corresponds to a device passed through to its own guest. Some systems do not
> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
> interrupt pin for the majority of PCI devices behind a single router,
> resulting in overuse of a GSI.
> 
> Signed-off-by: Igor Druzhinin 
> ---
> 
> If people think that would make sense - I can rework the array to a list of
> domain pointers to avoid the limit.

Not sure about this. What is the biggest number you've found on any
system? (I assume the chosen value of 31 has some headroom.)

Instead I'm wondering whether this wouldn't better be a Kconfig
setting (or even command line controllable). There don't look to be
any restrictions on the precise value chosen (i.e. 2**n-1 like is
the case for old and new values here, for whatever reason), so a
simple permitted range of like 4...64 would seem fine to specify.
Whether the default then would want to be 8 (close to the current
7) or higher (around the actually observed maximum) is a different
question.

Jan



Re: [PATCH v4 01/11] viridian: don't blindly write to 32-bit registers is 'mode' is invalid

2020-12-02 Thread Jan Beulich
On 02.12.2020 10:21, Paul Durrant wrote:
> From: Paul Durrant 
> 
> If hvm_guest_x86_mode() returns something other than 8 or 4 then
> viridian_hypercall() will return immediately but, on the way out, will write
> back status as if 'mode' was 4. This patch simply makes it leave the registers
> alone.
> 
> NOTE: The formatting of the 'out' label and the switch statement are also
>   adjusted as per CODING_STYLE.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Wei Liu 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: "Roger Pau Monné" 
> 
> v4:
>  - Fixed another CODING_STYLE violation.

Partly:

> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -692,13 +692,15 @@ int viridian_hypercall(struct cpu_user_regs *regs)
>  break;
>  }
>  
> -out:
> + out:
>  output.result = status;
> -switch (mode) {
> +switch (mode)

There are also two blanks missing here. Will again record this as
to be taken care of while committing, once an ack arrives. (And
btw, the earlier of the two "is" in the subject also wants to be
"if".)

Jan



RE: [PATCH v4 01/11] viridian: don't blindly write to 32-bit registers is 'mode' is invalid

2020-12-02 Thread Paul Durrant
> -Original Message-
> From: Jan Beulich 
> Sent: 02 December 2020 09:29
> To: Paul Durrant 
> Cc: Paul Durrant ; Wei Liu ; Andrew Cooper
> ; Roger Pau Monné ; 
> xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 01/11] viridian: don't blindly write to 32-bit 
> registers is 'mode' is invalid
> 
> On 02.12.2020 10:21, Paul Durrant wrote:
> > From: Paul Durrant 
> >
> > If hvm_guest_x86_mode() returns something other than 8 or 4 then
> > viridian_hypercall() will return immediately but, on the way out, will write
> > back status as if 'mode' was 4. This patch simply makes it leave the 
> > registers
> > alone.
> >
> > NOTE: The formatting of the 'out' label and the switch statement are also
> >   adjusted as per CODING_STYLE.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Wei Liu 
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: "Roger Pau Monné" 
> >
> > v4:
> >  - Fixed another CODING_STYLE violation.
> 
> Partly:
> 
> > --- a/xen/arch/x86/hvm/viridian/viridian.c
> > +++ b/xen/arch/x86/hvm/viridian/viridian.c
> > @@ -692,13 +692,15 @@ int viridian_hypercall(struct cpu_user_regs *regs)
> >  break;
> >  }
> >
> > -out:
> > + out:
> >  output.result = status;
> > -switch (mode) {
> > +switch (mode)
> 
> There are also two blanks missing here. Will again record this as
> to be taken care of while committing, once an ack arrives. (And
> btw, the earlier of the two "is" in the subject also wants to be
> "if".)

Gah. The lack of a style checker really is annoying. I'll fix locally in case 
there's a v5.

  Paul

> 
> Jan




[xen-unstable-coverity test] 157155: all pass - PUSHED

2020-12-02 Thread osstest service owner
flight 157155 xen-unstable-coverity real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157155/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 xen  3ae469af8e680df31eecd0a2ac6a83b58ad7ce53
baseline version:
 xen  f7d7d53f6464cff94ead4c15d21e79ce4d9173f5

Last test of basis   157090  2020-11-29 09:18:25 Z3 days
Testing same since   157155  2020-12-02 09:19:29 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 
  Manuel Bouyer 
  Roger Pau Monné 

jobs:
 coverity-amd64   pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f7d7d53f64..3ae469af8e  3ae469af8e680df31eecd0a2ac6a83b58ad7ce53 -> 
coverity-tested/smoke



Re: [PATCH v2 06/17] xen/cpupool: use ERR_PTR() for returning error cause from cpupool_create()

2020-12-02 Thread Jürgen Groß

On 02.12.20 09:58, Jan Beulich wrote:

On 01.12.2020 09:21, Juergen Gross wrote:

Instead of a pointer to an error variable as parameter just use
ERR_PTR() to return the cause of an error in cpupool_create().

This propagates to scheduler_alloc(), too.

Signed-off-by: Juergen Gross 


Reviewed-by: Jan Beulich 
with one small question:


--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -3233,26 +3233,25 @@ struct scheduler *scheduler_get_default(void)
  return &ops;
  }
  
-struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)

+struct scheduler *scheduler_alloc(unsigned int sched_id)
  {
  int i;
+int ret;


I guess you didn't merge this with i's declaration because of a
plan/hope for i to be converted to unsigned int?


The main reason is I don't like overloading variables this way.

Any sane compiler will do that for me as it will discover that the
two variables are not alive at the same time, so the generated code
should be the same, while the written code stays more readable this
way.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()

2020-12-02 Thread Jan Beulich
On 30.11.2020 17:50, Hongyan Xia wrote:
> From: Hongyan Xia 
> 
> There is simply no guarantee that vmap won't return superpages to the
> caller. It can happen if the list of MFNs are contiguous, or we simply
> have a large granularity. Although rare, if such things do happen, we
> will simply hit BUG_ON() and crash. Properly handle such cases in a new
> implementation.
> 
> Note that vmap is now too large to be a macro, so implement it as a
> normal function and move the declaration to mm.h (page.h cannot handle
> mfn_t).

I'm not happy about this movement, and it's also not really clear to
me what problem page.h would have in principle. Yes, it can't
include xen/mm.h, but I think it is long overdue that we disentangle
this at least some. Let me see what I can do as a prereq for this
change, but see also below.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5194,6 +5194,49 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>  }  \
>  } while ( false )
>  
> +mfn_t vmap_to_mfn(const void *v)
> +{
> +bool locking = system_state > SYS_STATE_boot;
> +unsigned int l2_offset = l2_table_offset((unsigned long)v);
> +unsigned int l1_offset = l1_table_offset((unsigned long)v);
> +l3_pgentry_t *pl3e = virt_to_xen_l3e((unsigned long)v);
> +l2_pgentry_t *pl2e;
> +l1_pgentry_t *pl1e;

Can't all three of them be pointer-to-const?

> +struct page_info *l3page;
> +mfn_t ret;
> +
> +ASSERT(pl3e);

if ( !pl3e )
{
ASSERT_UNREACHABLE();
return INVALID_MFN;
}

as per the bottom of ./CODING_STYLE? (Similarly further down
then.)

> +l3page = virt_to_page(pl3e);
> +L3T_LOCK(l3page);
> +
> +ASSERT(l3e_get_flags(*pl3e) & _PAGE_PRESENT);
> +if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
> +{
> +ret = mfn_add(l3e_get_mfn(*pl3e),
> +  (l2_offset << PAGETABLE_ORDER) + l1_offset);
> +L3T_UNLOCK(l3page);
> +return ret;

To keep the locked region as narrow as possible

mfn = l3e_get_mfn(*pl3e);
L3T_UNLOCK(l3page);
return mfn_add(mfn, (l2_offset << PAGETABLE_ORDER) + l1_offset);

? However, in particular because of the recurring unlocks on
the exit paths I wonder whether it wouldn't be better to
restructure the whole function such that there'll be one unlock
and one return. Otoh I'm afraid what I'm asking for here is
going to yield a measurable set of goto-s ...

> +}
> +
> +pl2e = map_l2t_from_l3e(*pl3e) + l2_offset;
> +ASSERT(l2e_get_flags(*pl2e) & _PAGE_PRESENT);
> +if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
> +{
> +ret = mfn_add(l2e_get_mfn(*pl2e), l1_offset);
> +L3T_UNLOCK(l3page);
> +return ret;
> +}
> +
> +pl1e = map_l1t_from_l2e(*pl2e) + l1_offset;
> +UNMAP_DOMAIN_PAGE(pl2e);
> +ASSERT(l1e_get_flags(*pl1e) & _PAGE_PRESENT);
> +ret = l1e_get_mfn(*pl1e);
> +L3T_UNLOCK(l3page);
> +UNMAP_DOMAIN_PAGE(pl1e);
> +
> +return ret;
> +}

Now for the name of the function: The only aspect tying it
somewhat to vmap() is that it assumes (asserts) it'll find a
valid mapping. I think it wants renaming, and vmap_to_mfn()
then would become a #define of it (perhaps even retaining
its property of getting unsigned long passed in), at which
point it also doesn't need moving out of page.h. As to the
actual name, xen_map_to_mfn() to somewhat match up with
map_pages_to_xen()?

Jan



Re: [PATCH] xen: remove trailing semicolon in macro definition

2020-12-02 Thread Jürgen Groß

On 27.11.20 17:07, t...@redhat.com wrote:

From: Tom Rix 

The macro use will already have a semicolon.

Signed-off-by: Tom Rix 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 06/17] xen/cpupool: use ERR_PTR() for returning error cause from cpupool_create()

2020-12-02 Thread Jan Beulich
On 02.12.2020 10:56, Jürgen Groß wrote:
> On 02.12.20 09:58, Jan Beulich wrote:
>> On 01.12.2020 09:21, Juergen Gross wrote:
>>> Instead of a pointer to an error variable as parameter just use
>>> ERR_PTR() to return the cause of an error in cpupool_create().
>>>
>>> This propagates to scheduler_alloc(), too.
>>>
>>> Signed-off-by: Juergen Gross 
>>
>> Reviewed-by: Jan Beulich 
>> with one small question:
>>
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -3233,26 +3233,25 @@ struct scheduler *scheduler_get_default(void)
>>>   return &ops;
>>>   }
>>>   
>>> -struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
>>> +struct scheduler *scheduler_alloc(unsigned int sched_id)
>>>   {
>>>   int i;
>>> +int ret;
>>
>> I guess you didn't merge this with i's declaration because of a
>> plan/hope for i to be converted to unsigned int?
> 
> The main reason is I don't like overloading variables this way.

That's no what I asked. I was wondering why the new decl wasn't

int i, ret;

Jan



Re: [PATCH v2 06/17] xen/cpupool: use ERR_PTR() for returning error cause from cpupool_create()

2020-12-02 Thread Jürgen Groß

On 02.12.20 11:46, Jan Beulich wrote:

On 02.12.2020 10:56, Jürgen Groß wrote:

On 02.12.20 09:58, Jan Beulich wrote:

On 01.12.2020 09:21, Juergen Gross wrote:

Instead of a pointer to an error variable as parameter just use
ERR_PTR() to return the cause of an error in cpupool_create().

This propagates to scheduler_alloc(), too.

Signed-off-by: Juergen Gross 


Reviewed-by: Jan Beulich 
with one small question:


--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -3233,26 +3233,25 @@ struct scheduler *scheduler_get_default(void)
   return &ops;
   }
   
-struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)

+struct scheduler *scheduler_alloc(unsigned int sched_id)
   {
   int i;
+int ret;


I guess you didn't merge this with i's declaration because of a
plan/hope for i to be converted to unsigned int?


The main reason is I don't like overloading variables this way.


That's no what I asked. I was wondering why the new decl wasn't

 int i, ret;


Oh, then I completely misunderstood your question, sorry.

I had no special reason when writing the patch, but I think changing
i to be unsigned is a good idea. I'll do that in the next iteration.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v9] xen/events: do some cleanups in evtchn_fifo_set_pending()

2020-12-02 Thread Jan Beulich
On 02.12.2020 08:48, Juergen Gross wrote:
> evtchn_fifo_set_pending() can be simplified a little bit. Especially
> testing for the fifo control block to exist can be moved out of the
> main if clause of the function enabling to avoid testing the LINKED bit
> twice.
> 
> Suggested-by: Jan Beulich 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 



Re: [PATCH v2 5/7] xen/arm: Add handler for cp15 ID registers

2020-12-02 Thread Bertrand Marquis
HI Volodymyr,

> On 1 Dec 2020, at 16:54, Volodymyr Babchuk  wrote:
> 
> 
> Hi,
> 
> Bertrand Marquis writes:
> 
>> Hi Volodymyr,
>> 
>>> On 1 Dec 2020, at 12:07, Volodymyr Babchuk  
>>> wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> Bertrand Marquis writes:
>>> 
 Hi,
 
> On 30 Nov 2020, at 20:31, Volodymyr Babchuk  
> wrote:
> 
> 
> Bertrand Marquis writes:
> 
>> Add support for emulation of cp15 based ID registers (on arm32 or when
>> running a 32bit guest on arm64).
>> The handlers are returning the values stored in the guest_cpuinfo
>> structure.
>> In the current status the MVFR registers are no supported.
> 
> It is unclear what will happen with registers that are not covered by
> guest_cpuinfo structure. According to ARM ARM, it is implementation
> defined if such accesses will be trapped. On other hand, there are many
> registers which are RAZ. So, good behaving guest can try to read one of
> that registers and it will get undefined instruction exception, instead
> of just reading all zeroes.
 
 This is true in the status of this patch but this is solved by the next 
 patch
 which is adding proper handling of those registers (add CP10 exception
 support), at least for MVFR ones.
 
 From ARM ARM point of view, I did handle all registers listed I think.
 If you think some are missing please point me to them as O do not
 completely understand what are the “registers not covered” unless
 you mean the MVFR ones.
>>> 
>>> Well, I may be wrong for aarch32 case, but for aarch64, there are number
>>> of reserved registers in IDs range. Those registers should read as
>>> zero. You can find them in the section "C5.1.6 op0==0b11, Moves to and
>>> from non-debug System registers and Special-purpose registers" of ARM
>>> DDI 0487B.a. Check out "Table C5-6 System instruction encodings for
>>> non-Debug System register accesses".
>> 
>> The point of the serie is to handle all registers trapped due to TID3 bit in 
>> HCR_EL2.
>> 
>> And i think I handled all of them but I might be wrong.
>> 
>> Handling all registers for op0==0b11 will cover a lot more things.
>> This can be done of course but this was not the point of this serie.
>> 
>> The listing in HCR_EL2 documentation is pretty complete and if I miss any 
>> register
>> there please tell me but I do no understand from the documentation that all 
>> registers
>> with op0 3 are trapped by TID3.
> 
> My concern is that the same code may observe different effects when
> running in baremetal mode and as VM.
> 
> For example, we are trying to run a newer version of a kernel, that
> supports some hypothetical ARMv8.9. And it tries to read a new ID
> register which is absent in ARMv8.2. There are possible cases:
> 
> 0. It runs as a baremetal code on a compatible architecture. So it just
> accesses this register and all is fine.
> 
> 1. It runs as baremetal code on older ARM8 architecture. Current
> reference manual states that those registers should read as zero, so
> all is fine, as well.
> 
> 2. It runs as VM on an older architecture. It is IMPLEMENTATION DEFINED
> if HCR.ID3 will cause traps when access to unassigned registers:
> 
> 2a. Platform does not cause traps and software reads zeros directly from
> real registers. This is a good outcome.
> 
> 2b. Platform causes trap, and your code injects the undefined
> instruction exception. This is a case that bothers me. Well written code
> that is tries to be compatible with different versions of architecture
> will fail there.
> 
> 3. It runs as VM on a never architecture. I can only speculate there,
> but I think, that list of registers trapped by HCR.ID3 will be extended
> when new features are added. At least, this does not contradict with the
> current IMPLEMENTATION DEFINED constraint. In this case, hypervisor will
> inject exception when guest tries to access a valid register.
> 
> 
> So, in my opinion and to be compatible with the reference manual, we
> should allow guests to read "Reserved, RAZ" registers.

Thanks for the very detailed explanation, I know better understand what you
mean here.

I will try to check if we could return RAZ for “other” op0=3 registers and what
should be done on cp15 registers to have something equivalent.

Regards
Bertrand

> 
> 
> 
>> Regards
>> Bertrand
>> 
>> 
>>> 
>>> 
> 
>> Signed-off-by: Bertrand Marquis 
>> ---
>> Changes in V2: rebase
>> ---
>> xen/arch/arm/vcpreg.c | 35 +++
>> 1 file changed, 35 insertions(+)
>> 
>> diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
>> index cdc91cdf5b..d0c6406f34 100644
>> --- a/xen/arch/arm/vcpreg.c
>> +++ b/xen/arch/arm/vcpreg.c
>> @@ -155,6 +155,14 @@ TVM_REG32(CONTEXTIDR, CONTEXTIDR_EL1)
>>   break;  \
>>   }
>> 
>> +/* Macro to generate easily case for ID co-proces

[xen-4.11-testing test] 157137: tolerable FAIL - PUSHED

2020-12-02 Thread osstest service owner
flight 157137 xen-4.11-testing real [real]
flight 157156 xen-4.11-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157137/
http://logs.test-lab.xenproject.org/osstest/logs/157156/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-vhd  13 guest-start fail pass in 157156-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-vhd 14 migrate-support-check fail in 157156 never pass
 test-armhf-armhf-xl-vhd 15 saverestore-support-check fail in 157156 never pass
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156986
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156986
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156986
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156986
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156986
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156986
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156986
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156986
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156986
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 156986
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 156986
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156986
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156986
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 xen  41a822c3926350f26917d747c8dfed1c44a2cf42
baseline version:
 xen  62aed78b8e0cc6dcd99b80a528650ad0619b3909

Last test of basis   156986  2020-11-24 13:36:13 Z7 days
Testing same since   157137  2020-12-

Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common

2020-12-02 Thread Oleksandr



On 02.12.20 10:00, Jan Beulich wrote:

Hi Jan


On 01.12.2020 19:53, Oleksandr wrote:

On 01.12.20 13:03, Alex Bennée wrote:

Oleksandr Tyshchenko  writes:

@@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, 
ioservid_t id,
   if ( s->emulator != current->domain )
   goto out;
   
-rc = p2m_set_ioreq_server(d, flags, s);

+rc = arch_ioreq_server_map_mem_type(d, s, flags);
   
out:

   spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
   
-if ( rc == 0 && flags == 0 )

-{
-struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-if ( read_atomic(&p2m->ioreq.entry_count) )
-p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
-}
-

It should be noted that p2m holds it's own lock but I'm unfamiliar with
Xen's locking architecture. Is there anything that prevents another vCPU
accessing a page that is also being used my ioreq on the first vCPU?

I am not sure that I would be able to provide reasonable explanations here.
All what I understand is that p2m_change_entry_type_global() x86
specific (we don't have p2m_ioreq_server concept on Arm) and should
remain as such (not exposed to the common code).
IIRC, I raised a question during V2 review whether we could have ioreq
server lock around the call to p2m_change_entry_type_global() and didn't
get objections.

Not getting objections doesn't mean much. Personally I don't recall
such a question, but this doesn't mean much.


Sorry for not being clear here. Discussion happened at [1] when I was 
asked to move hvm_map_mem_type_to_ioreq_server() to the common code.




  The important thing
here is that you properly justify this change in the description (I
didn't look at this version of the patch as a whole yet, so quite
likely you actually do). This is because you need to guarantee that
you don't introduce any lock order violations by this.
Yes, almost all changes in this patch are mostly mechanical and leave 
things as they are.
The change with p2m_change_entry_type_global() requires an additional 
attention, so decided to put emphasis on touching that
in the description and add a comment in the code that it is called with 
ioreq_server lock held.



[1] 
https://patchwork.kernel.org/project/xen-devel/patch/1602780274-29141-2-git-send-email-olekst...@gmail.com/#23734839


--
Regards,

Oleksandr Tyshchenko




Re: [PATCH v2 5/7] xen/arm: Add handler for cp15 ID registers

2020-12-02 Thread Bertrand Marquis
Hi,

> On 2 Dec 2020, at 11:12, Bertrand Marquis  wrote:
> 
> HI Volodymyr,
> 
>> On 1 Dec 2020, at 16:54, Volodymyr Babchuk  
>> wrote:
>> 
>> 
>> Hi,
>> 
>> Bertrand Marquis writes:
>> 
>>> Hi Volodymyr,
>>> 
 On 1 Dec 2020, at 12:07, Volodymyr Babchuk  
 wrote:
 
 
 Hi,
 
 Bertrand Marquis writes:
 
> Hi,
> 
>> On 30 Nov 2020, at 20:31, Volodymyr Babchuk  
>> wrote:
>> 
>> 
>> Bertrand Marquis writes:
>> 
>>> Add support for emulation of cp15 based ID registers (on arm32 or when
>>> running a 32bit guest on arm64).
>>> The handlers are returning the values stored in the guest_cpuinfo
>>> structure.
>>> In the current status the MVFR registers are no supported.
>> 
>> It is unclear what will happen with registers that are not covered by
>> guest_cpuinfo structure. According to ARM ARM, it is implementation
>> defined if such accesses will be trapped. On other hand, there are many
>> registers which are RAZ. So, good behaving guest can try to read one of
>> that registers and it will get undefined instruction exception, instead
>> of just reading all zeroes.
> 
> This is true in the status of this patch but this is solved by the next 
> patch
> which is adding proper handling of those registers (add CP10 exception
> support), at least for MVFR ones.
> 
> From ARM ARM point of view, I did handle all registers listed I think.
> If you think some are missing please point me to them as O do not
> completely understand what are the “registers not covered” unless
> you mean the MVFR ones.
 
 Well, I may be wrong for aarch32 case, but for aarch64, there are number
 of reserved registers in IDs range. Those registers should read as
 zero. You can find them in the section "C5.1.6 op0==0b11, Moves to and
 from non-debug System registers and Special-purpose registers" of ARM
 DDI 0487B.a. Check out "Table C5-6 System instruction encodings for
 non-Debug System register accesses".
>>> 
>>> The point of the serie is to handle all registers trapped due to TID3 bit 
>>> in HCR_EL2.
>>> 
>>> And i think I handled all of them but I might be wrong.
>>> 
>>> Handling all registers for op0==0b11 will cover a lot more things.
>>> This can be done of course but this was not the point of this serie.
>>> 
>>> The listing in HCR_EL2 documentation is pretty complete and if I miss any 
>>> register
>>> there please tell me but I do no understand from the documentation that all 
>>> registers
>>> with op0 3 are trapped by TID3.
>> 
>> My concern is that the same code may observe different effects when
>> running in baremetal mode and as VM.
>> 
>> For example, we are trying to run a newer version of a kernel, that
>> supports some hypothetical ARMv8.9. And it tries to read a new ID
>> register which is absent in ARMv8.2. There are possible cases:
>> 
>> 0. It runs as a baremetal code on a compatible architecture. So it just
>> accesses this register and all is fine.
>> 
>> 1. It runs as baremetal code on older ARM8 architecture. Current
>> reference manual states that those registers should read as zero, so
>> all is fine, as well.
>> 
>> 2. It runs as VM on an older architecture. It is IMPLEMENTATION DEFINED
>> if HCR.ID3 will cause traps when access to unassigned registers:
>> 
>> 2a. Platform does not cause traps and software reads zeros directly from
>> real registers. This is a good outcome.
>> 
>> 2b. Platform causes trap, and your code injects the undefined
>> instruction exception. This is a case that bothers me. Well written code
>> that is tries to be compatible with different versions of architecture
>> will fail there.
>> 
>> 3. It runs as VM on a never architecture. I can only speculate there,
>> but I think, that list of registers trapped by HCR.ID3 will be extended
>> when new features are added. At least, this does not contradict with the
>> current IMPLEMENTATION DEFINED constraint. In this case, hypervisor will
>> inject exception when guest tries to access a valid register.
>> 
>> 
>> So, in my opinion and to be compatible with the reference manual, we
>> should allow guests to read "Reserved, RAZ" registers.
> 
> Thanks for the very detailed explanation, I know better understand what you
> mean here.
> 
> I will try to check if we could return RAZ for “other” op0=3 registers and 
> what
> should be done on cp15 registers to have something equivalent.
> 

In fact I need to add handling for other registers mentionned by the TID3
description in the armv8 architecture manual:
"This field traps all MRS accesses to registers in the following range that are 
not
already mentioned in this field description: Op0 == 3, op1 == 0, CRn == c0,
 CRm == {c1-c7}, op2 == {0-7}.”
"This field traps all MRC accesses to encodings in the following range that are 
not
already mentioned in this field description: coproc == p15, opc1 == 0, CRn == 
c0,
CRm == {c2

Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()

2020-12-02 Thread Hongyan Xia
On Wed, 2020-12-02 at 11:04 +0100, Jan Beulich wrote:
> On 30.11.2020 17:50, Hongyan Xia wrote:
> > From: Hongyan Xia 
> > 
> > There is simply no guarantee that vmap won't return superpages to
> > the
> > caller. It can happen if the list of MFNs are contiguous, or we
> > simply
> > have a large granularity. Although rare, if such things do happen,
> > we
> > will simply hit BUG_ON() and crash. Properly handle such cases in a
> > new
> > implementation.
> > 
> > Note that vmap is now too large to be a macro, so implement it as a
> > normal function and move the declaration to mm.h (page.h cannot
> > handle
> > mfn_t).
> 
> I'm not happy about this movement, and it's also not really clear to
> me what problem page.h would have in principle. Yes, it can't
> include xen/mm.h, but I think it is long overdue that we disentangle
> this at least some. Let me see what I can do as a prereq for this
> change, but see also below.
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5194,6 +5194,49 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >  }  \
> >  } while ( false )
> >  
> > +mfn_t vmap_to_mfn(const void *v)
> > +{
> > +bool locking = system_state > SYS_STATE_boot;
> > +unsigned int l2_offset = l2_table_offset((unsigned long)v);
> > +unsigned int l1_offset = l1_table_offset((unsigned long)v);
> > +l3_pgentry_t *pl3e = virt_to_xen_l3e((unsigned long)v);
> > +l2_pgentry_t *pl2e;
> > +l1_pgentry_t *pl1e;
> 
> Can't all three of them be pointer-to-const?

They can (and yes they should).

> > +struct page_info *l3page;
> > +mfn_t ret;
> > +
> > +ASSERT(pl3e);
> 
> if ( !pl3e )
> {
> ASSERT_UNREACHABLE();
> return INVALID_MFN;
> }
> 
> as per the bottom of ./CODING_STYLE? (Similarly further down
> then.)

Will revise.

> > +l3page = virt_to_page(pl3e);
> > +L3T_LOCK(l3page);
> > +
> > +ASSERT(l3e_get_flags(*pl3e) & _PAGE_PRESENT);
> > +if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
> > +{
> > +ret = mfn_add(l3e_get_mfn(*pl3e),
> > +  (l2_offset << PAGETABLE_ORDER) + l1_offset);
> > +L3T_UNLOCK(l3page);
> > +return ret;
> 
> To keep the locked region as narrow as possible
> 
> mfn = l3e_get_mfn(*pl3e);
> L3T_UNLOCK(l3page);
> return mfn_add(mfn, (l2_offset << PAGETABLE_ORDER) +
> l1_offset);
> 
> ? However, in particular because of the recurring unlocks on
> the exit paths I wonder whether it wouldn't be better to
> restructure the whole function such that there'll be one unlock
> and one return. Otoh I'm afraid what I'm asking for here is
> going to yield a measurable set of goto-s ...

I can do that.

But what about the lock narrowing? Will be slightly more tricky when
there is goto. Naturally:

ret = full return mfn;
goto out;

out:
UNLOCK();
return ret;

but with narrowing, my first reaction is:

ret = high bits of mfn;
l2_offset = 0;
l1_offset = 0;
goto out;

out:
UNLOCK();
return mfn + l2_offset << TABLE_ORDER + l1_offset;

To be honest, I doubt it is really worth it and I prefer the first one.

> 
> > +}
> > +
> > +pl2e = map_l2t_from_l3e(*pl3e) + l2_offset;
> > +ASSERT(l2e_get_flags(*pl2e) & _PAGE_PRESENT);
> > +if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
> > +{
> > +ret = mfn_add(l2e_get_mfn(*pl2e), l1_offset);
> > +L3T_UNLOCK(l3page);
> > +return ret;
> > +}
> > +
> > +pl1e = map_l1t_from_l2e(*pl2e) + l1_offset;
> > +UNMAP_DOMAIN_PAGE(pl2e);
> > +ASSERT(l1e_get_flags(*pl1e) & _PAGE_PRESENT);
> > +ret = l1e_get_mfn(*pl1e);
> > +L3T_UNLOCK(l3page);
> > +UNMAP_DOMAIN_PAGE(pl1e);
> > +
> > +return ret;
> > +}
> 
> Now for the name of the function: The only aspect tying it
> somewhat to vmap() is that it assumes (asserts) it'll find a
> valid mapping. I think it wants renaming, and vmap_to_mfn()
> then would become a #define of it (perhaps even retaining
> its property of getting unsigned long passed in), at which
> point it also doesn't need moving out of page.h. As to the
> actual name, xen_map_to_mfn() to somewhat match up with
> map_pages_to_xen()?

I actually really like this idea. I will come up with something in the
next rev. But if we want to make it generic, shouldn't we not ASSERT on
pl*e and the PRESENT flag and just return INVALID_MFN? Then this
function works on both mapped address (assumption of vmap_to_mfn()) and
other use cases.

Hongyan




[xen-unstable-smoke test] 157157: tolerable all pass - PUSHED

2020-12-02 Thread osstest service owner
flight 157157 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157157/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  cabf60fc32d4cfa1d74a2bdfcdb294a31da5d68e
baseline version:
 xen  3ae469af8e680df31eecd0a2ac6a83b58ad7ce53

Last test of basis   157112  2020-11-30 14:00:26 Z1 days
Testing same since   157157  2020-12-02 10:00:26 Z0 days1 attempts


People who touched revisions under test:
  Dario Faggioli 
  Jan Beulich 
  Juergen Gross 
  Julien Grall 
  Rahul Singh 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3ae469af8e..cabf60fc32  cabf60fc32d4cfa1d74a2bdfcdb294a31da5d68e -> smoke



Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-02 Thread Borislav Petkov
On Fri, Nov 20, 2020 at 12:46:22PM +0100, Juergen Gross wrote:
> @@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
> SYM_L_GLOBAL)
>* Try to use SYSRET instead of IRET if we're returning to
>* a completely clean 64-bit userspace context.  If we're not,
>* go to the slow exit path.
> +  * In the Xen PV case we must use iret anyway.
>*/
> - movqRCX(%rsp), %rcx
> - movqRIP(%rsp), %r11
>  
> - cmpq%rcx, %r11  /* SYSRET requires RCX == RIP */
> - jne swapgs_restore_regs_and_return_to_usermode
> + ALTERNATIVE __stringify( \
> + movqRCX(%rsp), %rcx; \
> + movqRIP(%rsp), %r11; \
> + cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
> + jne swapgs_restore_regs_and_return_to_usermode), \
> + "jmpswapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV

Why such a big ALTERNATIVE when you can simply do:

/*
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
 * In the Xen PV case we must use iret anyway.
 */
ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", 
X86_FEATURE_XENPV

movqRCX(%rsp), %rcx;
movqRIP(%rsp), %r11;
cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
jne swapgs_restore_regs_and_return_to_usermode

?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()

2020-12-02 Thread Jan Beulich
On 02.12.2020 13:17, Hongyan Xia wrote:
> On Wed, 2020-12-02 at 11:04 +0100, Jan Beulich wrote:
>> On 30.11.2020 17:50, Hongyan Xia wrote:
>>> +l3page = virt_to_page(pl3e);
>>> +L3T_LOCK(l3page);
>>> +
>>> +ASSERT(l3e_get_flags(*pl3e) & _PAGE_PRESENT);
>>> +if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
>>> +{
>>> +ret = mfn_add(l3e_get_mfn(*pl3e),
>>> +  (l2_offset << PAGETABLE_ORDER) + l1_offset);
>>> +L3T_UNLOCK(l3page);
>>> +return ret;
>>
>> To keep the locked region as narrow as possible
>>
>> mfn = l3e_get_mfn(*pl3e);
>> L3T_UNLOCK(l3page);
>> return mfn_add(mfn, (l2_offset << PAGETABLE_ORDER) +
>> l1_offset);
>>
>> ? However, in particular because of the recurring unlocks on
>> the exit paths I wonder whether it wouldn't be better to
>> restructure the whole function such that there'll be one unlock
>> and one return. Otoh I'm afraid what I'm asking for here is
>> going to yield a measurable set of goto-s ...
> 
> I can do that.
> 
> But what about the lock narrowing? Will be slightly more tricky when
> there is goto. Naturally:
> 
> ret = full return mfn;
> goto out;
> 
> out:
> UNLOCK();
> return ret;
> 
> but with narrowing, my first reaction is:
> 
> ret = high bits of mfn;
> l2_offset = 0;
> l1_offset = 0;
> goto out;
> 
> out:
> UNLOCK();
> return mfn + l2_offset << TABLE_ORDER + l1_offset;
> 
> To be honest, I doubt it is really worth it and I prefer the first one.

That's why I said "However ..." - I did realize both won't fit
together very well.

>>> +}
>>> +
>>> +pl2e = map_l2t_from_l3e(*pl3e) + l2_offset;
>>> +ASSERT(l2e_get_flags(*pl2e) & _PAGE_PRESENT);
>>> +if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
>>> +{
>>> +ret = mfn_add(l2e_get_mfn(*pl2e), l1_offset);
>>> +L3T_UNLOCK(l3page);
>>> +return ret;
>>> +}
>>> +
>>> +pl1e = map_l1t_from_l2e(*pl2e) + l1_offset;
>>> +UNMAP_DOMAIN_PAGE(pl2e);
>>> +ASSERT(l1e_get_flags(*pl1e) & _PAGE_PRESENT);
>>> +ret = l1e_get_mfn(*pl1e);
>>> +L3T_UNLOCK(l3page);
>>> +UNMAP_DOMAIN_PAGE(pl1e);
>>> +
>>> +return ret;
>>> +}
>>
>> Now for the name of the function: The only aspect tying it
>> somewhat to vmap() is that it assumes (asserts) it'll find a
>> valid mapping. I think it wants renaming, and vmap_to_mfn()
>> then would become a #define of it (perhaps even retaining
>> its property of getting unsigned long passed in), at which
>> point it also doesn't need moving out of page.h. As to the
>> actual name, xen_map_to_mfn() to somewhat match up with
>> map_pages_to_xen()?
> 
> I actually really like this idea. I will come up with something in the
> next rev. But if we want to make it generic, shouldn't we not ASSERT on
> pl*e and the PRESENT flag and just return INVALID_MFN? Then this
> function works on both mapped address (assumption of vmap_to_mfn()) and
> other use cases.

Depends - we can still document that this function is going to
require a valid mapping. I did consider the generalization, too,
but this to a certain degree also collides with virt_to_xen_l3e()
allocating an L3 table, which isn't what we would want for a
fully generic lookup function. IOW - I'm undecided and will take
it from wherever you move it (albeit with no promise to not ask
for further adjustment).

Jan



Re: [PATCH v2 2/8] xen/arm: revert atomic operation related command-queue insertion patch

2020-12-02 Thread Rahul Singh
Hello Stefano,

Thanks for reviewing the code.

> On 1 Dec 2020, at 10:23 pm, Stefano Stabellini  wrote:
> 
> On Thu, 26 Nov 2020, Rahul Singh wrote:
>> Linux SMMUv3 code implements the commands-queue insertion based on
>> atomic operations implemented in Linux. Atomic functions used by the
>> commands-queue insertion is not implemented in XEN therefore revert the
>> patch that implemented the commands-queue insertion based on atomic
>> operations.
>> 
>> Once the proper atomic operations will be available in XEN the driver
>> can be updated.
>> 
>> Reverted the commit 587e6c10a7ce89a5924fdbeff2ec524fbd6a124b
>> iommu/arm-smmu-v3: Reduce contention during command-queue insertion
> 
> I checked 587e6c10a7ce89a5924fdbeff2ec524fbd6a124b: this patch does more
> than just reverting 587e6c10a7ce89a5924fdbeff2ec524fbd6a124b. It looks
> like it is also reverting edd0351e7bc49555d8b5ad8438a65a7ca262c9f0 and
> some other commits.
> 
> Please can you provide a complete list of reverted commits? I would like
> to be able to do the reverts myself on the linux tree and see that the
> driver textually matches the one on the xen tree with this patch
> applied.
> 
> 

Yes this patch is also reverting the commits that is based on the code that 
introduced the atomic-operations. I will add all the commit id in next version 
of the patch in commit msg. 

Patches that are reverted in this patch are as follows:

9e773aee8c3e1b3ba019c5c7f8435aaa836c6130  iommu/arm-smmu-v3: Batch ATC 
invalidation commands
edd0351e7bc49555d8b5ad8438a65a7ca262c9f0  iommu/arm-smmu-v3: Batch context 
descriptor invalidation
4ce8da453640147101bda418640394637c1a7cfc  iommu/arm-smmu-v3: Add command queue 
batching helpers
2af2e72b18b499fa36d3f7379fd010ff25d2a984.  iommu/arm-smmu-v3: Defer TLB 
invalidation until ->iotlb_sync() 
587e6c10a7ce89a5924fdbeff2ec524fbd6a124b  iommu/arm-smmu-v3: Reduce 
contention during command-queue insertion


Regards,
Rahul

> 
>> Signed-off-by: Rahul Singh 
>> ---
>> xen/drivers/passthrough/arm/smmu-v3.c | 847 ++
>> 1 file changed, 180 insertions(+), 667 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>> b/xen/drivers/passthrough/arm/smmu-v3.c
>> index c192544e87..97eac61ea4 100644
>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>> @@ -330,15 +330,6 @@
>> #define CMDQ_ERR_CERROR_ABT_IDX  2
>> #define CMDQ_ERR_CERROR_ATC_INV_IDX  3
>> 
>> -#define CMDQ_PROD_OWNED_FLAGQ_OVERFLOW_FLAG
>> -
>> -/*
>> - * This is used to size the command queue and therefore must be at least
>> - * BITS_PER_LONG so that the valid_map works correctly (it relies on the
>> - * total number of queue entries being a multiple of BITS_PER_LONG).
>> - */
>> -#define CMDQ_BATCH_ENTRIES  BITS_PER_LONG
>> -
>> #define CMDQ_0_OPGENMASK_ULL(7, 0)
>> #define CMDQ_0_SSV   (1UL << 11)
>> 
>> @@ -407,8 +398,9 @@
>> #define PRIQ_1_ADDR_MASK GENMASK_ULL(63, 12)
>> 
>> /* High-level queue structures */
>> -#define ARM_SMMU_POLL_TIMEOUT_US100 /* 1s! */
>> -#define ARM_SMMU_POLL_SPIN_COUNT10
>> +#define ARM_SMMU_POLL_TIMEOUT_US100
>> +#define ARM_SMMU_CMDQ_SYNC_TIMEOUT_US   100 /* 1s! */
>> +#define ARM_SMMU_CMDQ_SYNC_SPIN_COUNT   10
>> 
>> #define MSI_IOVA_BASE0x800
>> #define MSI_IOVA_LENGTH  0x10
>> @@ -513,24 +505,15 @@ struct arm_smmu_cmdq_ent {
>> 
>>  #define CMDQ_OP_CMD_SYNC0x46
>>  struct {
>> +u32 msidata;
>>  u64 msiaddr;
>>  } sync;
>>  };
>> };
>> 
>> struct arm_smmu_ll_queue {
>> -union {
>> -u64 val;
>> -struct {
>> -u32 prod;
>> -u32 cons;
>> -};
>> -struct {
>> -atomic_tprod;
>> -atomic_tcons;
>> -} atomic;
>> -u8  __pad[SMP_CACHE_BYTES];
>> -} cacheline_aligned_in_smp;
>> +u32 prod;
>> +u32 cons;
>>  u32 max_n_shift;
>> };
>> 
>> @@ -548,23 +531,9 @@ struct arm_smmu_queue {
>>  u32 __iomem *cons_reg;
>> };
>> 
>> -struct arm_smmu_queue_poll {
>> -ktime_t timeout;
>> -unsigned intdelay;
>> -unsigned intspin_cnt;
>> -boolwfe;
>> -};
>> -
>> struct arm_smmu_cmdq {
>>  struct arm_smmu_queue   q;
>> -atomic_long_t   *valid_map;
>> -atomic_towner_prod;
>> -atomic_tlock;
>> -};
>> -
>> -struct arm_smmu_cmdq_batch {
>> -

Re: [PATCH v2 5/8] xen/arm: Remove support for PCI ATS on SMMUv3

2020-12-02 Thread Rahul Singh
Hello Stefano,

> On 2 Dec 2020, at 12:39 am, Stefano Stabellini  wrote:
> 
> On Thu, 26 Nov 2020, Rahul Singh wrote:
>> PCI ATS functionality is not implemented and tested on ARM. Remove the
>> PCI ATS support, once PCI ATS support is tested and available this
>> patch can be added.
>> 
>> Signed-off-by: Rahul Singh 
> 
> This looks like a revert of 9ce27afc0830f. Can we add that as a note to
> the commit message?
 
Ok I will add in next version.

> 
> One very minor comment at the bottom

Ack.

Regards,
Rahul
> 
> 
>> ---
>> xen/drivers/passthrough/arm/smmu-v3.c | 273 --
>> 1 file changed, 273 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>> b/xen/drivers/passthrough/arm/smmu-v3.c
>> index 401f7ae006..6a33628087 100644
>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>> @@ -460,16 +460,6 @@ struct arm_smmu_cmdq_ent {
>>  u64 addr;
>>  } tlbi;
>> 
>> -#define CMDQ_OP_ATC_INV 0x40
>> -#define ATC_INV_SIZE_ALL52
>> -struct {
>> -u32 sid;
>> -u32 ssid;
>> -u64 addr;
>> -u8  size;
>> -boolglobal;
>> -} atc;
>> -
>>  #define CMDQ_OP_PRI_RESP0x41
>>  struct {
>>  u32 sid;
>> @@ -632,7 +622,6 @@ struct arm_smmu_master {
>>  struct list_headdomain_head;
>>  u32 *sids;
>>  unsigned intnum_sids;
>> -boolats_enabled;
>>  unsigned intssid_bits;
>> };
>> 
>> @@ -650,7 +639,6 @@ struct arm_smmu_domain {
>> 
>>  struct io_pgtable_ops   *pgtbl_ops;
>>  boolnon_strict;
>> -atomic_tnr_ats_masters;
>> 
>>  enum arm_smmu_domain_stage  stage;
>>  union {
>> @@ -886,14 +874,6 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
>> arm_smmu_cmdq_ent *ent)
>>  case CMDQ_OP_TLBI_S12_VMALL:
>>  cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
>>  break;
>> -case CMDQ_OP_ATC_INV:
>> -cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
>> -cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
>> -cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid);
>> -cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid);
>> -cmd[1] |= FIELD_PREP(CMDQ_ATC_1_SIZE, ent->atc.size);
>> -cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
>> -break;
>>  case CMDQ_OP_PRI_RESP:
>>  cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
>>  cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
>> @@ -925,7 +905,6 @@ static void arm_smmu_cmdq_skip_err(struct 
>> arm_smmu_device *smmu)
>>  [CMDQ_ERR_CERROR_NONE_IDX]  = "No error",
>>  [CMDQ_ERR_CERROR_ILL_IDX]   = "Illegal command",
>>  [CMDQ_ERR_CERROR_ABT_IDX]   = "Abort on command fetch",
>> -[CMDQ_ERR_CERROR_ATC_INV_IDX]   = "ATC invalidate timeout",
>>  };
>> 
>>  int i;
>> @@ -945,14 +924,6 @@ static void arm_smmu_cmdq_skip_err(struct 
>> arm_smmu_device *smmu)
>>  dev_err(smmu->dev, "retrying command fetch\n");
>>  case CMDQ_ERR_CERROR_NONE_IDX:
>>  return;
>> -case CMDQ_ERR_CERROR_ATC_INV_IDX:
>> -/*
>> - * ATC Invalidation Completion timeout. CONS is still pointing
>> - * at the CMD_SYNC. Attempt to complete other pending commands
>> - * by repeating the CMD_SYNC, though we might well end up back
>> - * here since the ATC invalidation may still be pending.
>> - */
>> -return;
>>  case CMDQ_ERR_CERROR_ILL_IDX:
>>  default:
>>  break;
>> @@ -1422,9 +1393,6 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>>  }
>> 
>> -if (master->ats_enabled)
>> -dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
>> - STRTAB_STE_1_EATS_TRANS));
>> 
>>  arm_smmu_sync_ste_for_sid(smmu, sid);
>>  /* See comment in arm_smmu_write_ctx_desc() */
>> @@ -1633,112 +1601,6 @@ static irqreturn_t arm_smmu_combined_irq_handler(int 
>> irq, void *dev)
>>  return IRQ_WAKE_THREAD;
>> }
>> 
>> -static void
>> -arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
>> -struct arm_smmu_cmdq_ent *cmd)
>> -{
>> -size_t log2_span;
>> -size_t span_mask;
>> -/* ATC inv

Re: [PATCH v2 4/8] xen/arm: Remove support for MSI on SMMUv3

2020-12-02 Thread Rahul Singh
Hello Stefano,

> On 2 Dec 2020, at 12:40 am, Stefano Stabellini  wrote:
> 
> On Tue, 1 Dec 2020, Stefano Stabellini wrote:
>> On Thu, 26 Nov 2020, Rahul Singh wrote:
>>> XEN does not support MSI on ARM platforms, therefore remove the MSI
>>> support from SMMUv3 driver.
>>> 
>>> Signed-off-by: Rahul Singh 
>> 
>> I wonder if it makes sense to #ifdef CONFIG_MSI this code instead of
>> removing it completely.
> 
> One more thought: could this patch be achieved by reverting
> 166bdbd23161160f2abcea70621adba179050bee ? If this patch could be done
> by a couple of revert, it would be great to say it in the commit
> message.
> 
 Ok will add in next version.

> 
>> In the past, we tried to keep the entire file as textually similar to
>> the original Linux driver as possible to make it easier to backport
>> features and fixes. So, in this case we would probably not even used an
>> #ifdef but maybe something like:
>> 
>>  if (/* msi_enabled */ 0)
>>  return;
>> 
>> at the beginning of arm_smmu_setup_msis.
>> 
>> 
>> However, that strategy didn't actually work very well because backports
>> have proven difficult to do anyway. So at that point we might as well at
>> least have clean code in Xen and do the changes properly.

Main reason to remove the feature/code that is not usable in XEN is to have a 
clean code.

Regards,
Rahul

>> 
>> So that's my reasoning for accepting this patch :-)
>> 
>> Julien, are you happy with this too?
>> 
>> 
>>> ---
>>> xen/drivers/passthrough/arm/smmu-v3.c | 176 +-
>>> 1 file changed, 3 insertions(+), 173 deletions(-)
>>> 
>>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>>> b/xen/drivers/passthrough/arm/smmu-v3.c
>>> index cec304e51a..401f7ae006 100644
>>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>>> @@ -416,31 +416,6 @@ enum pri_resp {
>>> PRI_RESP_SUCC = 2,
>>> };
>>> 
>>> -enum arm_smmu_msi_index {
>>> -   EVTQ_MSI_INDEX,
>>> -   GERROR_MSI_INDEX,
>>> -   PRIQ_MSI_INDEX,
>>> -   ARM_SMMU_MAX_MSIS,
>>> -};
>>> -
>>> -static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>>> -   [EVTQ_MSI_INDEX] = {
>>> -   ARM_SMMU_EVTQ_IRQ_CFG0,
>>> -   ARM_SMMU_EVTQ_IRQ_CFG1,
>>> -   ARM_SMMU_EVTQ_IRQ_CFG2,
>>> -   },
>>> -   [GERROR_MSI_INDEX] = {
>>> -   ARM_SMMU_GERROR_IRQ_CFG0,
>>> -   ARM_SMMU_GERROR_IRQ_CFG1,
>>> -   ARM_SMMU_GERROR_IRQ_CFG2,
>>> -   },
>>> -   [PRIQ_MSI_INDEX] = {
>>> -   ARM_SMMU_PRIQ_IRQ_CFG0,
>>> -   ARM_SMMU_PRIQ_IRQ_CFG1,
>>> -   ARM_SMMU_PRIQ_IRQ_CFG2,
>>> -   },
>>> -};
>>> -
>>> struct arm_smmu_cmdq_ent {
>>> /* Common fields */
>>> u8  opcode;
>>> @@ -504,10 +479,6 @@ struct arm_smmu_cmdq_ent {
>>> } pri;
>>> 
>>> #define CMDQ_OP_CMD_SYNC0x46
>>> -   struct {
>>> -   u32 msidata;
>>> -   u64 msiaddr;
>>> -   } sync;
>>> };
>>> };
>>> 
>>> @@ -649,12 +620,6 @@ struct arm_smmu_device {
>>> 
>>> struct arm_smmu_strtab_cfg  strtab_cfg;
>>> 
>>> -   /* Hi16xx adds an extra 32 bits of goodness to its MSI payload */
>>> -   union {
>>> -   u32 sync_count;
>>> -   u64 padding;
>>> -   };
>>> -
>>> /* IOMMU core code handle */
>>> struct iommu_device iommu;
>>> };
>>> @@ -945,20 +910,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
>>> arm_smmu_cmdq_ent *ent)
>>> cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
>>> break;
>>> case CMDQ_OP_CMD_SYNC:
>>> -   if (ent->sync.msiaddr)
>>> -   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, 
>>> CMDQ_SYNC_0_CS_IRQ);
>>> -   else
>>> -   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, 
>>> CMDQ_SYNC_0_CS_SEV);
>>> -   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
>>> -   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, 
>>> ARM_SMMU_MEMATTR_OIWB);
>>> -   /*
>>> -* Commands are written little-endian, but we want the SMMU to
>>> -* receive MSIData, and thus write it back to memory, in CPU
>>> -* byte order, so big-endian needs an extra byteswap here.
>>> -*/
>>> -   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA,
>>> -cpu_to_le32(ent->sync.msidata));
>>> -   cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
>>> +   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
>>> break;
>>> default:
>>> return -ENOENT;
>>> @@ -1054,50 +1006,6 @@ static void arm_smmu_cmdq_issue_cmd(struct 
>>> arm_smmu_device *smmu,
>>> spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
>>> }
>>> 
>>> -/*
>>> - * The difference between val and sync_idx is bounded by the maximum size 
>>> of
>>> - * a queue at 2^20 entries, 

Re: [PATCH v2 6/8] xen/arm: Remove support for Stage-1 translation on SMMUv3.

2020-12-02 Thread Rahul Singh
Hello Stefano,

> On 2 Dec 2020, at 12:53 am, Stefano Stabellini  wrote:
> 
> On Thu, 26 Nov 2020, Rahul Singh wrote:
>> Linux SMMUv3 driver supports both Stage-1 and Stage-2 translations.
>> As of now only Stage-2 translation support has been tested.
>> 
>> Once Stage-1 translation support is tested this patch can be added.
>> 
>> Signed-off-by: Rahul Singh 
> 
> [...]
> 
> 
>> @@ -1871,19 +1476,9 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>  }
>> 
>>  /* Restrict the stage to what we can actually support */
>> -if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>> -smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
>> -if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
>> -smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
>> +smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
> 
> It would be good to add an helpful error message if
> ARM_SMMU_FEAT_TRANS_S2 is missing.
> 

Ack. I will add error message.

Regards,
Rahul
> 
>>  switch (smmu_domain->stage) {
>> -case ARM_SMMU_DOMAIN_S1:
>> -ias = (smmu->features & ARM_SMMU_FEAT_VAX) ? 52 : 48;
>> -ias = min_t(unsigned long, ias, VA_BITS);
>> -oas = smmu->ias;
>> -fmt = ARM_64_LPAE_S1;
>> -finalise_stage_fn = arm_smmu_domain_finalise_s1;
>> -break;
>>  case ARM_SMMU_DOMAIN_NESTED:
>>  case ARM_SMMU_DOMAIN_S2:
>>  ias = smmu->ias;




Re: [PATCH v2 2/8] xen/arm: revert atomic operation related command-queue insertion patch

2020-12-02 Thread Julien Grall

Hi Rahul,

On 26/11/2020 17:02, Rahul Singh wrote:

Linux SMMUv3 code implements the commands-queue insertion based on
atomic operations implemented in Linux. Atomic functions used by the
commands-queue insertion is not implemented in XEN therefore revert the
patch that implemented the commands-queue insertion based on atomic
operations.


This commit message explains why we revert but not the consequence of 
the revert. Can outline if there are any and why they are fine?


I am also interested to have a list of *must* have for the driver to be 
out of the tech preview.


Cheers,

--
Julien Grall



[xen-4.10-testing test] 157138: tolerable FAIL - PUSHED

2020-12-02 Thread osstest service owner
flight 157138 xen-4.10-testing real [real]
flight 157159 xen-4.10-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157138/
http://logs.test-lab.xenproject.org/osstest/logs/157159/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-qemuu-freebsd11-amd64 19 guest-localmigrate/x10 fail pass in 
157159-retest

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-thunderx  3 hosts-allocate   fail  like 156985
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156985
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156985
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156985
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 156985
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156985
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156985
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156985
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156985
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 156985
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156985
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156985
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156985
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156985
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  1d72d9915edff0dd41f601bbb0b1f83c02ff1689
baseline version:
 xen  17ec9b43af072051edb1380a5eb459a382dcafa3

Last test of basis   156985  2020-11-24 13:35:57 Z8 days
Testing same since   157138  2020-12-01 17:05:42 Z0 days1 attempts


People who touched revisions under test:
  Jan

Re: [PATCH v2 3/8] xen/arm: revert patch related to XArray

2020-12-02 Thread Julien Grall

Hi Rahul,

On 26/11/2020 17:02, Rahul Singh wrote:

XArray is not implemented in XEN revert the patch that introduce the
XArray code in SMMUv3 driver.


Similar to the atomic revert, you are explaining why the revert but not 
the consequences. I think this is quite important to have them outline 
in the commit message as it looks like it means the SMMU driver would 
not scale.




Once XArray is implemented in XEN this patch can be added in XEN.


What's the plan for that?

Cheers,

--
Julien Grall



Re: [PATCH v2 5/8] xen/arm: Remove support for PCI ATS on SMMUv3

2020-12-02 Thread Julien Grall

Hi Rahul,

On 26/11/2020 17:02, Rahul Singh wrote:
PCI ATS functionality is not implemented and tested on ARM. 


I agree that short term, this is not going to be implemented. However, I 
do expect this feature to be used medium term (mostly likely before the 
driver is out of tech preview).


So I think it would be best to keep the code around.

Cheers,

--
Julien Grall



Re: [PATCH v2 4/8] xen/arm: Remove support for MSI on SMMUv3

2020-12-02 Thread Julien Grall

Hi Rahul,

On 02/12/2020 13:12, Rahul Singh wrote:

Hello Stefano,


On 2 Dec 2020, at 12:40 am, Stefano Stabellini  wrote:

On Tue, 1 Dec 2020, Stefano Stabellini wrote:

On Thu, 26 Nov 2020, Rahul Singh wrote:

XEN does not support MSI on ARM platforms, therefore remove the MSI
support from SMMUv3 driver.

Signed-off-by: Rahul Singh 


I wonder if it makes sense to #ifdef CONFIG_MSI this code instead of
removing it completely.


One more thought: could this patch be achieved by reverting
166bdbd23161160f2abcea70621adba179050bee ? If this patch could be done
by a couple of revert, it would be great to say it in the commit
message.


  Ok will add in next version.




In the past, we tried to keep the entire file as textually similar to
the original Linux driver as possible to make it easier to backport
features and fixes. So, in this case we would probably not even used an
#ifdef but maybe something like:

  if (/* msi_enabled */ 0)
  return;

at the beginning of arm_smmu_setup_msis.


However, that strategy didn't actually work very well because backports
have proven difficult to do anyway. So at that point we might as well at
least have clean code in Xen and do the changes properly.


It was difficult because Linux decided to rework how IOMMU drivers 
works. I agree the risk is still there and therefore clean code would be 
better with some caveats (see below).




Main reason to remove the feature/code that is not usable in XEN is to have a 
clean code.


I agree that short term this feature will not be usable. However, I 
think we need to think about {medium, long}-term plan to avoid extra 
effort in the future because the driver evolve in a way making the 
revert of revert impossible.


Therefore I would prefer to keep both the MSI and PCI ATS present as 
they are going to be useful/necessary on some platforms. It doesn't 
matter that they don't work because the driver will be in tech preview.


Cheers,

--
Julien Grall



Re: [PATCH v2 7/8] xen/arm: Remove Linux specific code that is not usable in XEN

2020-12-02 Thread Rahul Singh
Hello Stefano,

> On 2 Dec 2020, at 1:48 am, Stefano Stabellini  wrote:
> 
> On Thu, 26 Nov 2020, Rahul Singh wrote:
>> struct io_pgtable_ops, struct io_pgtable_cfg, struct iommu_flush_ops,
>> and struct iommu_ops related code are linux specific.
>> 
>> Remove code related to above struct as code is dead code in XEN.
> 
> There are still instances of struct io_pgtable_cfg after applying this
> patch in the following functions:
> - arm_smmu_domain_finalise_s2
> - arm_smmu_domain_finalise
> 

Ok. I will remove the instances.
> 
> 
>> Signed-off-by: Rahul Singh 
>> ---
>> xen/drivers/passthrough/arm/smmu-v3.c | 457 --
>> 1 file changed, 457 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>> b/xen/drivers/passthrough/arm/smmu-v3.c
>> index 40e3890a58..55d1cba194 100644
>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>> @@ -599,7 +593,6 @@ struct arm_smmu_domain {
>>  struct arm_smmu_device  *smmu;
>>  struct mutexinit_mutex; /* Protects smmu pointer */
>> 
>> -struct io_pgtable_ops   *pgtbl_ops;
>>  boolnon_strict;
>> 
>>  enum arm_smmu_domain_stage  stage;
>> @@ -1297,74 +1290,6 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>  arm_smmu_cmdq_issue_sync(smmu);
>> }
>> 
>> -static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>> -  size_t granule, bool leaf, void 
>> *cookie)
>> -{
>> -struct arm_smmu_domain *smmu_domain = cookie;
>> -struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_cmdq_ent cmd = {
>> -.tlbi = {
>> -.leaf   = leaf,
>> -.addr   = iova,
>> -},
>> -};
>> -
>> -cmd.opcode  = CMDQ_OP_TLBI_S2_IPA;
>> -cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> -
>> -do {
>> -arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>> -cmd.tlbi.addr += granule;
>> -} while (size -= granule);
>> -}
>> -
>> -static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
>> - unsigned long iova, size_t granule,
>> - void *cookie)
>> -{
>> -arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie);
>> -}
>> -
>> -static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>> -  size_t granule, void *cookie)
>> -{
>> -struct arm_smmu_domain *smmu_domain = cookie;
>> -struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -
>> -arm_smmu_tlb_inv_range_nosync(iova, size, granule, false, cookie);
>> -arm_smmu_cmdq_issue_sync(smmu);
>> -}
>> -
>> -static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
>> -  size_t granule, void *cookie)
>> -{
>> -struct arm_smmu_domain *smmu_domain = cookie;
>> -struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -
>> -arm_smmu_tlb_inv_range_nosync(iova, size, granule, true, cookie);
>> -arm_smmu_cmdq_issue_sync(smmu);
>> -}
>> -
>> -static const struct iommu_flush_ops arm_smmu_flush_ops = {
>> -.tlb_flush_all  = arm_smmu_tlb_inv_context,
>> -.tlb_flush_walk = arm_smmu_tlb_inv_walk,
>> -.tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
>> -.tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
>> -};
>> -
>> -/* IOMMU API */
>> -static bool arm_smmu_capable(enum iommu_cap cap)
>> -{
>> -switch (cap) {
>> -case IOMMU_CAP_CACHE_COHERENCY:
>> -return true;
>> -case IOMMU_CAP_NOEXEC:
>> -return true;
>> -default:
>> -return false;
>> -}
>> -}
>> -
>> static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>> {
>>  struct arm_smmu_domain *smmu_domain;
>> @@ -1421,7 +1346,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
>> *domain)
>>  struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
>> 
>>  iommu_put_dma_cookie(domain);
>> -free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>> 
>>  if (cfg->vmid)
>>  arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
>> @@ -1429,7 +1353,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
>> *domain)
>>  kfree(smmu_domain);
>> }
>> 
>> -
>> static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>> struct arm_smmu_master *master,
>> struct io_pgtable_cfg *pgtbl_cfg)
>> @@ -1437,7 +1360,6 @@ static int arm_smmu_domain_finalise_s2(struct 
>> arm_smmu_domain *smmu_domain,
>>  int vmid;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>>  struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
>> -typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
>> 
>>  vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
>>  if (vmid < 0)
>> @@ -1461,20 +1383,12 @@ static int a

[libvirt test] 157150: regressions - FAIL

2020-12-02 Thread osstest service owner
flight 157150 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157150/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  0436a144681ae4e0790932e31347d3471823c12e
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  145 days
Failing since151818  2020-07-11 04:18:52 Z  144 days  139 attempts
Testing same since   157150  2020-12-02 04:20:23 Z0 days1 attempts


People who touched revisions under test:
  Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Brian Turek 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Erik Skultety 
  Fabian Freyer 
  Fangge Jin 
  Fedora Weblate Translation 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Ian Wienand 
  Jamie Strandboge 
  Jamie Strandboge 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Laine Stump 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Neal Gompa 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  Shaojun Yang 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tuguoyi 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yanqiu Zhang 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu

Re: [PATCH v2 7/8] xen/arm: Remove Linux specific code that is not usable in XEN

2020-12-02 Thread Julien Grall

Hi Rahul,

On 02/12/2020 14:34, Rahul Singh wrote:

dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
@@ -2595,9 +2208,6 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,

parse_driver_options(smmu);

-   if (of_dma_is_coherent(dev->of_node))
-   smmu->features |= ARM_SMMU_FEAT_COHERENCY;
-


Why this change? The ARM_SMMU_FEAT_COHERENCY flag is still used in
arm_smmu_device_hw_probe.


I remove this as this is linux specific.  I will remove ARM_SMMU_FEAT_COHERENCY 
flag  used in arm_smmu_device_hw_probe


From my understanding, ARM_SMMU_FEAT_COHERENCY indicate whether the 
SMMU page table walker will snoop the cache. If the flag is not set, 
then Xen will have to clean to PoC every entry updated in the p2m.


Therefore, I think we need to keep this code.

In the case we don't need to keep the code, then I think the reason 
should be explained in the commit message.


Cheers,

--
Julien Grall



Re: [PATCH v2 7/8] xen/arm: Remove Linux specific code that is not usable in XEN

2020-12-02 Thread Julien Grall




On 26/11/2020 17:02, Rahul Singh wrote:

struct io_pgtable_ops, struct io_pgtable_cfg, struct iommu_flush_ops,
and struct iommu_ops related code are linux specific.


So the assumption is we are going to support only sharing with the P2M 
and the IOMMU. That's probably fine short term, but long-term we are 
going to need unsharing the page-tables (there are issues on some 
platforms if the ITS doorbell is accessed by the processors).



I am ok with removing anything related to the unsharing code. But I 
think it should be clarified here.




Remove code related to above struct as code is dead code in XEN.

Signed-off-by: Rahul Singh 
---
  xen/drivers/passthrough/arm/smmu-v3.c | 457 --
  1 file changed, 457 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index 40e3890a58..55d1cba194 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -402,13 +402,7 @@
  #define ARM_SMMU_CMDQ_SYNC_TIMEOUT_US 100 /* 1s! */
  #define ARM_SMMU_CMDQ_SYNC_SPIN_COUNT 10
  
-#define MSI_IOVA_BASE			0x800

-#define MSI_IOVA_LENGTH0x10
-
  static bool disable_bypass = 1;
-module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
-MODULE_PARM_DESC(disable_bypass,
-   "Disable bypass streams such that incoming transactions from devices that 
are not attached to an iommu domain will report an abort back to the device and will not 
be allowed to pass through the SMMU.");


Per your commit message, this doesn't look related to what you intend to 
remove. Maybe your commit message should be updated?


  
  enum pri_resp {

PRI_RESP_DENY = 0,
@@ -599,7 +593,6 @@ struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct mutexinit_mutex; /* Protects smmu pointer */
  
-	struct io_pgtable_ops		*pgtbl_ops;

boolnon_strict;
  
  	enum arm_smmu_domain_stage	stage;

@@ -1297,74 +1290,6 @@ static void arm_smmu_tlb_inv_context(void *cookie)
arm_smmu_cmdq_issue_sync(smmu);
  }
  
-static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,

- size_t granule, bool leaf, void 
*cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_device *smmu = smmu_domain->smmu;
-   struct arm_smmu_cmdq_ent cmd = {
-   .tlbi = {
-   .leaf   = leaf,
-   .addr   = iova,
-   },
-   };
-
-   cmd.opcode  = CMDQ_OP_TLBI_S2_IPA;
-   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
-
-   do {
-   arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-   cmd.tlbi.addr += granule;
-   } while (size -= granule);
-}
-
-static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
-unsigned long iova, size_t granule,
-void *cookie)
-{
-   arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie);
-}
-
-static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_device *smmu = smmu_domain->smmu;
-
-   arm_smmu_tlb_inv_range_nosync(iova, size, granule, false, cookie);
-   arm_smmu_cmdq_issue_sync(smmu);
-}
-
-static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_device *smmu = smmu_domain->smmu;
-
-   arm_smmu_tlb_inv_range_nosync(iova, size, granule, true, cookie);
-   arm_smmu_cmdq_issue_sync(smmu);
-}
-
-static const struct iommu_flush_ops arm_smmu_flush_ops = {
-   .tlb_flush_all  = arm_smmu_tlb_inv_context,
-   .tlb_flush_walk = arm_smmu_tlb_inv_walk,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
-   .tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
-};
-
-/* IOMMU API */
-static bool arm_smmu_capable(enum iommu_cap cap)
-{
-   switch (cap) {
-   case IOMMU_CAP_CACHE_COHERENCY:
-   return true;
-   case IOMMU_CAP_NOEXEC:
-   return true;
-   default:
-   return false;
-   }
-}
-
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
  {
struct arm_smmu_domain *smmu_domain;
@@ -1421,7 +1346,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
  
  	iommu_put_dma_cookie(domain);

-   free_io_pgtable_ops(smmu_domain->pgtbl_ops);
  
  	if (cfg->vmid)

arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
@@ -1429,7 +1353,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
kfree(smmu_domain);
  }
 

Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-02 Thread Jürgen Groß

On 02.12.20 13:32, Borislav Petkov wrote:

On Fri, Nov 20, 2020 at 12:46:22PM +0100, Juergen Gross wrote:

@@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
SYM_L_GLOBAL)
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
+* In the Xen PV case we must use iret anyway.
 */
-   movqRCX(%rsp), %rcx
-   movqRIP(%rsp), %r11
  
-	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */

-   jne swapgs_restore_regs_and_return_to_usermode
+   ALTERNATIVE __stringify( \
+   movqRCX(%rsp), %rcx; \
+   movqRIP(%rsp), %r11; \
+   cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
+   jne swapgs_restore_regs_and_return_to_usermode), \
+   "jmp   swapgs_restore_regs_and_return_to_usermode", 
X86_FEATURE_XENPV


Why such a big ALTERNATIVE when you can simply do:

 /*
  * Try to use SYSRET instead of IRET if we're returning to
  * a completely clean 64-bit userspace context.  If we're not,
  * go to the slow exit path.
  * In the Xen PV case we must use iret anyway.
  */
 ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", 
X86_FEATURE_XENPV

 movqRCX(%rsp), %rcx;
 movqRIP(%rsp), %r11;
 cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
 jne swapgs_restore_regs_and_return_to_usermode

?



I wanted to avoid the additional NOPs for the bare metal case.

If you don't mind them I can do as you are suggesting.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 0/2] a tiny bit of header disentangling

2020-12-02 Thread Jan Beulich
While reviewing Hongyan's "x86/vmap: handle superpages in
vmap_to_mfn()" it became apparent that the interaction of
xen/mm.h and asm/page.h is problematic. Therefore some basic
page size related definitions get moved out of the latter, and
the mfn_t et al ones out of the former, each into new headers.

While various configurations build fine for me with these
changes in place, it's relatively likely that this may break
some more exotic ones. Such breakage ought to be easy to
resolve, so I hope this risk isn't going to be a hindrance of
the changes here going in.

1: include: don't use asm/page.h from common headers
2: mm: split out mfn_t / gfn_t / pfn_t definitions and helpers

Jan



[PATCH 1/2] include: don't use asm/page.h from common headers

2020-12-02 Thread Jan Beulich
Doing so limits what can be done in (in particular included by) this per-
arch header. Abstract out page shift/size related #define-s, which is all
the repsecitve headers care about. Extend the replacement / removal to
some x86 headers as well; some others now need to include page.h (and
they really should have before).

Arm's VADDR_BITS gets restricted to 32-bit, as its current value is
clearly wrong for 64-bit, but the constant also isn't used anywhere
right now (i.e. the #define could also be dropped altogether).

I wasn't sure about Arm's use of vaddr_t in PAGE_OFFSET(), and hence I
kept it and provided a way to override the #define in the common header.

Also drop the dead PAGE_FLAG_MASK altogether at this occasion.

Signed-off-by: Jan Beulich 

--- a/xen/arch/arm/arm64/lib/clear_page.S
+++ b/xen/arch/arm/arm64/lib/clear_page.S
@@ -14,6 +14,8 @@
  * along with this program.  If not, see .
  */
 
+#include 
+
 /*
  * Clear page @dest
  *
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -176,11 +176,6 @@
 #define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
 #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
 
-#define PAGE_SHIFT  12
-#define PAGE_SIZE   (_AC(1,L) << PAGE_SHIFT)
-#define PAGE_MASK   (~(PAGE_SIZE-1))
-#define PAGE_FLAG_MASK  (~0)
-
 #define NR_hypercalls 64
 
 #define STACK_ORDER 3
--- a/xen/include/asm-arm/current.h
+++ b/xen/include/asm-arm/current.h
@@ -1,6 +1,7 @@
 #ifndef __ARM_CURRENT_H__
 #define __ARM_CURRENT_H__
 
+#include 
 #include 
 
 #include 
--- /dev/null
+++ b/xen/include/asm-arm/page-shift.h
@@ -0,0 +1,15 @@
+#ifndef __ARM_PAGE_SHIFT_H__
+#define __ARM_PAGE_SHIFT_H__
+
+#define PAGE_SHIFT  12
+
+#define PAGE_OFFSET(ptr)((vaddr_t)(ptr) & ~PAGE_MASK)
+
+#ifdef CONFIG_ARM_64
+#define PADDR_BITS  48
+#else
+#define PADDR_BITS  40
+#define VADDR_BITS  32
+#endif
+
+#endif /* __ARM_PAGE_SHIFT_H__ */
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -2,21 +2,11 @@
 #define __ARM_PAGE_H__
 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_ARM_64
-#define PADDR_BITS  48
-#else
-#define PADDR_BITS  40
-#endif
-#define PADDR_MASK  ((1ULL << PADDR_BITS)-1)
-#define PAGE_OFFSET(ptr)((vaddr_t)(ptr) & ~PAGE_MASK)
-
-#define VADDR_BITS  32
-#define VADDR_MASK  (~0UL)
-
 /* Shareability values for the LPAE entries */
 #define LPAE_SH_NON_SHAREABLE 0x0
 #define LPAE_SH_UNPREDICTALE  0x1
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -8,8 +8,8 @@
 #define __X86_CURRENT_H__
 
 #include 
+#include 
 #include 
-#include 
 
 /*
  * Xen's cpu stacks are 8 pages (8-page aligned), arranged as:
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -1,6 +1,8 @@
 #ifndef __ARCH_DESC_H
 #define __ARCH_DESC_H
 
+#include 
+
 /*
  * Xen reserves a memory page of GDT entries.
  * No guest GDT entries exist beyond the Xen reserved area.
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -12,7 +12,7 @@
 #ifndef _ASM_FIXMAP_H
 #define _ASM_FIXMAP_H
 
-#include 
+#include 
 
 #define FIXADDR_TOP (VMAP_VIRT_END - PAGE_SIZE)
 #define FIXADDR_X_TOP (XEN_VIRT_END - PAGE_SIZE)
--- a/xen/include/asm-x86/guest/hyperv-hcall.h
+++ b/xen/include/asm-x86/guest/hyperv-hcall.h
@@ -20,12 +20,12 @@
 #define __X86_HYPERV_HCALL_H__
 
 #include 
+#include 
 #include 
 
 #include 
 #include 
 #include 
-#include 
 
 static inline uint64_t hv_do_hypercall(uint64_t control, paddr_t input_addr,
paddr_t output_addr)
--- a/xen/include/asm-x86/guest/hyperv-tlfs.h
+++ b/xen/include/asm-x86/guest/hyperv-tlfs.h
@@ -10,8 +10,8 @@
 #define _ASM_X86_HYPERV_TLFS_H
 
 #include 
+#include 
 #include 
-#include 
 
 /*
  * While not explicitly listed in the TLFS, Hyper-V always runs with a page 
size
--- a/xen/include/asm-x86/io.h
+++ b/xen/include/asm-x86/io.h
@@ -3,7 +3,6 @@
 
 #include 
 #include 
-#include 
 
 #define readb(x) (*(volatile uint8_t  *)(x))
 #define readw(x) (*(volatile uint16_t *)(x))
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
--- /dev/null
+++ b/xen/include/asm-x86/page-shift.h
@@ -0,0 +1,26 @@
+#ifndef __X86_PAGE_SHIFT_H__
+#define __X86_PAGE_SHIFT_H__
+
+#define L1_PAGETABLE_SHIFT  12
+#define L2_PAGETABLE_SHIFT  21
+#define L3_PAGETABLE_SHIFT  30
+#define L4_PAGETABLE_SHIFT  39
+#define PAGE_SHIFT  L1_PAGETABLE_SHIFT
+#define SUPERPAGE_SHIFT L2_PAGETABLE_SHIFT
+#define ROOT_PAGETABLE_SHIFTL4_PAGETABLE_SHIFT
+
+#define PAGETABLE_ORDER 9
+#define L1_PAGETABLE_ENTRIES(1 << PAGETABLE_ORDER)
+#define L2_PAGETABLE_ENTRIES(1 << PAGETABLE_ORDER)
+#d

[PATCH 2/2] mm: split out mfn_t / gfn_t / pfn_t definitions and helpers

2020-12-02 Thread Jan Beulich
xen/mm.h has heavy dependencies, while in a number of cases only these
type definitions are needed. This separation then also allows pulling in
these definitions when including xen/mm.h would cause cyclic
dependencies.

Replace xen/mm.h inclusion where possible in include/xen/. (In
xen/iommu.h also take the opportunity and correct the few remaining
sorting issues.)

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -10,7 +10,6 @@
  * Slimmed with Xen specific support.
  */
 
-#include 
 #include 
 #include 
 #include 
--- a/xen/drivers/char/meson-uart.c
+++ b/xen/drivers/char/meson-uart.c
@@ -18,7 +18,9 @@
  * License along with this program; If not, see .
  */
 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- a/xen/drivers/char/mvebu-uart.c
+++ b/xen/drivers/char/mvebu-uart.c
@@ -18,7 +18,9 @@
  * License along with this program; If not, see .
  */
 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- a/xen/include/asm-x86/io.h
+++ b/xen/include/asm-x86/io.h
@@ -49,6 +49,7 @@ __OUT(l,,int)
 
 /* Function pointer used to handle platform specific I/O port emulation. */
 #define IOEMUL_QUIRK_STUB_BYTES 9
+struct cpu_user_regs;
 extern unsigned int (*ioemul_handle_quirk)(
 u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs);
 
--- /dev/null
+++ b/xen/include/xen/frame-num.h
@@ -0,0 +1,96 @@
+#ifndef __XEN_FRAME_NUM_H__
+#define __XEN_FRAME_NUM_H__
+
+#include 
+#include 
+
+TYPE_SAFE(unsigned long, mfn);
+#define PRI_mfn  "05lx"
+#define INVALID_MFN  _mfn(~0UL)
+/*
+ * To be used for global variable initialization. This workaround a bug
+ * in GCC < 5.0.
+ */
+#define INVALID_MFN_INITIALIZER { ~0UL }
+
+#ifndef mfn_t
+#define mfn_t /* Grep fodder: mfn_t, _mfn() and mfn_x() are defined above */
+#define _mfn
+#define mfn_x
+#undef mfn_t
+#undef _mfn
+#undef mfn_x
+#endif
+
+static inline mfn_t mfn_add(mfn_t mfn, unsigned long i)
+{
+return _mfn(mfn_x(mfn) + i);
+}
+
+static inline mfn_t mfn_max(mfn_t x, mfn_t y)
+{
+return _mfn(max(mfn_x(x), mfn_x(y)));
+}
+
+static inline mfn_t mfn_min(mfn_t x, mfn_t y)
+{
+return _mfn(min(mfn_x(x), mfn_x(y)));
+}
+
+static inline bool_t mfn_eq(mfn_t x, mfn_t y)
+{
+return mfn_x(x) == mfn_x(y);
+}
+
+TYPE_SAFE(unsigned long, gfn);
+#define PRI_gfn  "05lx"
+#define INVALID_GFN  _gfn(~0UL)
+/*
+ * To be used for global variable initialization. This workaround a bug
+ * in GCC < 5.0 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64856
+ */
+#define INVALID_GFN_INITIALIZER { ~0UL }
+
+#ifndef gfn_t
+#define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above */
+#define _gfn
+#define gfn_x
+#undef gfn_t
+#undef _gfn
+#undef gfn_x
+#endif
+
+static inline gfn_t gfn_add(gfn_t gfn, unsigned long i)
+{
+return _gfn(gfn_x(gfn) + i);
+}
+
+static inline gfn_t gfn_max(gfn_t x, gfn_t y)
+{
+return _gfn(max(gfn_x(x), gfn_x(y)));
+}
+
+static inline gfn_t gfn_min(gfn_t x, gfn_t y)
+{
+return _gfn(min(gfn_x(x), gfn_x(y)));
+}
+
+static inline bool_t gfn_eq(gfn_t x, gfn_t y)
+{
+return gfn_x(x) == gfn_x(y);
+}
+
+TYPE_SAFE(unsigned long, pfn);
+#define PRI_pfn  "05lx"
+#define INVALID_PFN  (~0UL)
+
+#ifndef pfn_t
+#define pfn_t /* Grep fodder: pfn_t, _pfn() and pfn_x() are defined above */
+#define _pfn
+#define pfn_x
+#undef pfn_t
+#undef _pfn
+#undef pfn_x
+#endif
+
+#endif /* __XEN_FRAME_NUM_H__ */
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -23,7 +23,7 @@
 #ifndef __XEN_GRANT_TABLE_H__
 #define __XEN_GRANT_TABLE_H__
 
-#include 
+#include 
 #include 
 #include 
 #include 
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -19,14 +19,13 @@
 #ifndef _IOMMU_H_
 #define _IOMMU_H_
 
+#include 
 #include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
+#include 
 #include 
+#include 
 #include 
 
 TYPE_SAFE(uint64_t, dfn);
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -51,103 +51,13 @@
 #define __XEN_MM_H__
 
 #include 
+#include 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
-TYPE_SAFE(unsigned long, mfn);
-#define PRI_mfn  "05lx"
-#define INVALID_MFN  _mfn(~0UL)
-/*
- * To be used for global variable initialization. This workaround a bug
- * in GCC < 5.0.
- */
-#define INVALID_MFN_INITIALIZER { ~0UL }
-
-#ifndef mfn_t
-#define mfn_t /* Grep fodder: mfn_t, _mfn() and mfn_x() are defined above */
-#define _mfn
-#define mfn_x
-#undef mfn_t
-#undef _mfn
-#undef mfn_x
-#endif
-
-static inline mfn_t mfn_add(mfn_t mfn, unsigned long i)
-{
-return _mfn(mfn_x(mfn) + i);
-}
-
-static inline mfn_t mfn_max(mfn_t x, mfn_t y)
-{
-return _mfn(max(mfn_x(x), mfn_x(y)));
-}
-
-static inline mfn_t mfn_min(mfn_t x, mfn_t y)
-{
-return _mfn(min(mfn_x(x), mfn_x(y)));
-}
-
-static inline bool_t mfn_eq(mfn_t x, mfn_t y)
-{
-return mfn_x(x) == mfn_x(y)

Re: [PATCH] x86/IRQ: bump max number of guests for a shared IRQ to 31

2020-12-02 Thread Igor Druzhinin
On 02/12/2020 09:25, Jan Beulich wrote:
> On 01.12.2020 00:59, Igor Druzhinin wrote:
>> Current limit of 7 is too restrictive for modern systems where one GSI
>> could be shared by potentially many PCI INTx sources where each of them
>> corresponds to a device passed through to its own guest. Some systems do not
>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
>> interrupt pin for the majority of PCI devices behind a single router,
>> resulting in overuse of a GSI.
>>
>> Signed-off-by: Igor Druzhinin 
>> ---
>>
>> If people think that would make sense - I can rework the array to a list of
>> domain pointers to avoid the limit.
> 
> Not sure about this. What is the biggest number you've found on any
> system? (I assume the chosen value of 31 has some headroom.)

The value 31 was taken as a practical maximum for one specific HP system
if all of the PCI slots in all of its riser cards are occupied with GPUs.
The value is obtained by reverse engineering their ACPI tables. Currently
we're only concerned with number 8 (our graphics vendors do not recommend
installing more cards than that) but it's a matter of time it will grow.
I'm also not sure why this routing scheme was chosen in the first place:
is it dictated by hardware restrictions or firmware engineers being lazy - 
we're working with HP to determine it.

> Instead I'm wondering whether this wouldn't better be a Kconfig
> setting (or even command line controllable). There don't look to be
> any restrictions on the precise value chosen (i.e. 2**n-1 like is
> the case for old and new values here, for whatever reason), so a
> simple permitted range of like 4...64 would seem fine to specify.
> Whether the default then would want to be 8 (close to the current
> 7) or higher (around the actually observed maximum) is a different
> question.

I'm in favor of a command line argument here - it would be much less trouble
if a higher limit was suddenly necessary in the field. The default IMO
should definitely be higher than 8 - I'd stick with number 32 which to me
should cover our real world scenarios and apply some headroom for the future.

Igor



Re: [PATCH] x86/IRQ: bump max number of guests for a shared IRQ to 31

2020-12-02 Thread Jan Beulich
On 02.12.2020 15:53, Igor Druzhinin wrote:
> On 02/12/2020 09:25, Jan Beulich wrote:
>> On 01.12.2020 00:59, Igor Druzhinin wrote:
>>> Current limit of 7 is too restrictive for modern systems where one GSI
>>> could be shared by potentially many PCI INTx sources where each of them
>>> corresponds to a device passed through to its own guest. Some systems do not
>>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared 
>>> as
>>> interrupt pin for the majority of PCI devices behind a single router,
>>> resulting in overuse of a GSI.
>>>
>>> Signed-off-by: Igor Druzhinin 
>>> ---
>>>
>>> If people think that would make sense - I can rework the array to a list of
>>> domain pointers to avoid the limit.
>>
>> Not sure about this. What is the biggest number you've found on any
>> system? (I assume the chosen value of 31 has some headroom.)
> 
> The value 31 was taken as a practical maximum for one specific HP system
> if all of the PCI slots in all of its riser cards are occupied with GPUs.
> The value is obtained by reverse engineering their ACPI tables. Currently
> we're only concerned with number 8 (our graphics vendors do not recommend
> installing more cards than that) but it's a matter of time it will grow.
> I'm also not sure why this routing scheme was chosen in the first place:
> is it dictated by hardware restrictions or firmware engineers being lazy - 
> we're working with HP to determine it.

Thanks for the insight.

>> Instead I'm wondering whether this wouldn't better be a Kconfig
>> setting (or even command line controllable). There don't look to be
>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>> the case for old and new values here, for whatever reason), so a
>> simple permitted range of like 4...64 would seem fine to specify.
>> Whether the default then would want to be 8 (close to the current
>> 7) or higher (around the actually observed maximum) is a different
>> question.
> 
> I'm in favor of a command line argument here - it would be much less trouble
> if a higher limit was suddenly necessary in the field. The default IMO
> should definitely be higher than 8 - I'd stick with number 32 which to me
> should cover our real world scenarios and apply some headroom for the future.

Well, I'm concerned of the extra memory overhead. Every IRQ,
sharable or not, will get the extra slots allocated with the
current scheme. Perhaps a prereq change then would be to only
allocate multi-guest arrays for sharable IRQs, effectively
shrinking the overhead in particular for all MSI ones?

Jan



Re: [PATCH v2 09/17] xen/hypfs: move per-node function pointers into a dedicated struct

2020-12-02 Thread Jan Beulich
On 01.12.2020 09:21, Juergen Gross wrote:
> @@ -297,6 +321,7 @@ int hypfs_write_leaf(struct hypfs_entry_leaf *leaf,
>  int ret;
>  
>  ASSERT(this_cpu(hypfs_locked) == hypfs_write_locked);
> +ASSERT(leaf->e.max_size);
>  
>  if ( ulen > leaf->e.max_size )
>  return -ENOSPC;
> @@ -357,6 +382,7 @@ int hypfs_write_custom(struct hypfs_entry_leaf *leaf,
>  int ret;
>  
>  ASSERT(this_cpu(hypfs_locked) == hypfs_write_locked);
> +ASSERT(leaf->e.max_size);
>  
>  /* Avoid oversized buffer allocation. */
>  if ( ulen > MAX_PARAM_SIZE )

At the first glance both of these hunks look as if they
wouldn't belong here, but I guess the ASSERT()s are getting
lifted here from hypfs_write(). (The first looks even somewhat
redundant with the immediately following if().) If this
understanding of mine is correct,
Reviewed-by: Jan Beulich 

> @@ -382,19 +408,20 @@ int hypfs_write_custom(struct hypfs_entry_leaf *leaf,
>  return ret;
>  }
>  
> +int hypfs_write_deny(struct hypfs_entry_leaf *leaf,
> + XEN_GUEST_HANDLE_PARAM(void) uaddr, unsigned int ulen)
> +{
> +return -EACCES;
> +}
> +
>  static int hypfs_write(struct hypfs_entry *entry,
> XEN_GUEST_HANDLE_PARAM(void) uaddr, unsigned long 
> ulen)

As a tangent, is there a reason these write functions don't take
handles of "const void"? (I realize this likely is nothing that
wants addressing right here.)

Jan



Re: [PATCH v2 09/17] xen/hypfs: move per-node function pointers into a dedicated struct

2020-12-02 Thread Jürgen Groß

On 02.12.20 16:36, Jan Beulich wrote:

On 01.12.2020 09:21, Juergen Gross wrote:

@@ -297,6 +321,7 @@ int hypfs_write_leaf(struct hypfs_entry_leaf *leaf,
  int ret;
  
  ASSERT(this_cpu(hypfs_locked) == hypfs_write_locked);

+ASSERT(leaf->e.max_size);
  
  if ( ulen > leaf->e.max_size )

  return -ENOSPC;
@@ -357,6 +382,7 @@ int hypfs_write_custom(struct hypfs_entry_leaf *leaf,
  int ret;
  
  ASSERT(this_cpu(hypfs_locked) == hypfs_write_locked);

+ASSERT(leaf->e.max_size);
  
  /* Avoid oversized buffer allocation. */

  if ( ulen > MAX_PARAM_SIZE )


At the first glance both of these hunks look as if they
wouldn't belong here, but I guess the ASSERT()s are getting
lifted here from hypfs_write(). (The first looks even somewhat
redundant with the immediately following if().) If this
understanding of mine is correct,


It is.


Reviewed-by: Jan Beulich 


Thanks.




@@ -382,19 +408,20 @@ int hypfs_write_custom(struct hypfs_entry_leaf *leaf,
  return ret;
  }
  
+int hypfs_write_deny(struct hypfs_entry_leaf *leaf,

+ XEN_GUEST_HANDLE_PARAM(void) uaddr, unsigned int ulen)
+{
+return -EACCES;
+}
+
  static int hypfs_write(struct hypfs_entry *entry,
 XEN_GUEST_HANDLE_PARAM(void) uaddr, unsigned long ulen)


As a tangent, is there a reason these write functions don't take
handles of "const void"? (I realize this likely is nothing that
wants addressing right here.)


No, not really.

I'll change that.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 11/17] xen/hypfs: add getsize() and findentry() callbacks to hypfs_funcs

2020-12-02 Thread Jan Beulich
On 01.12.2020 09:21, Juergen Gross wrote:
> Add a getsize() function pointer to struct hypfs_funcs for being able
> to have dynamically filled entries without the need to take the hypfs
> lock each time the contents are being generated.
> 
> For directories add a findentry callback to the vector and modify
> hypfs_get_entry_rel() to use it.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 
with maybe one further small adjustment:

> @@ -176,15 +188,41 @@ static int hypfs_get_path_user(char *buf,
>  return 0;
>  }
>  
> +struct hypfs_entry *hypfs_leaf_findentry(const struct hypfs_entry_dir *dir,
> + const char *name,
> + unsigned int name_len)
> +{
> +return ERR_PTR(-ENOENT);
> +}

ENOENT seems odd to me here. There looks to be no counterpart to
EISDIR, so maybe ENODATA, EACCES, or EPERM?

Jan



Re: [PATCH v2 15/17] xen/cpupool: add cpupool directories

2020-12-02 Thread Jürgen Groß

On 01.12.20 09:21, Juergen Gross wrote:

Add /cpupool/ directories to hypfs. Those are completely
dynamic, so the related hypfs access functions need to be implemented.

Signed-off-by: Juergen Gross 
---
V2:
- added const (Jan Beulich)
- call hypfs_add_dir() in helper (Dario Faggioli)
- switch locking to enter/exit callbacks
---
  docs/misc/hypfs-paths.pandoc |   9 +++
  xen/common/sched/cpupool.c   | 122 +++
  2 files changed, 131 insertions(+)

diff --git a/docs/misc/hypfs-paths.pandoc b/docs/misc/hypfs-paths.pandoc
index 6c7b2f7ee3..aaca1cdf92 100644
--- a/docs/misc/hypfs-paths.pandoc
+++ b/docs/misc/hypfs-paths.pandoc
@@ -175,6 +175,15 @@ The major version of Xen.
  
  The minor version of Xen.
  
+ /cpupool/

+
+A directory of all current cpupools.
+
+ /cpupool/*/
+
+The individual cpupools. Each entry is a directory with the name being the
+cpupool-id (e.g. /cpupool/0/).
+
   /params/
  
  A directory of runtime parameters.

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 0db7d77219..3e17fdf95b 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -13,6 +13,8 @@
  
  #include 

  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -33,6 +35,7 @@ static int cpupool_moving_cpu = -1;
  static struct cpupool *cpupool_cpu_moving = NULL;
  static cpumask_t cpupool_locked_cpus;
  
+/* This lock nests inside sysctl or hypfs lock. */

  static DEFINE_SPINLOCK(cpupool_lock);
  
  static enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu;

@@ -1003,12 +1006,131 @@ static struct notifier_block cpu_nfb = {
  .notifier_call = cpu_callback
  };
  
+#ifdef CONFIG_HYPFS

+static const struct hypfs_entry *cpupool_pooldir_enter(
+const struct hypfs_entry *entry);
+
+static struct hypfs_funcs cpupool_pooldir_funcs = {
+.enter = cpupool_pooldir_enter,
+.exit = hypfs_node_exit,
+.read = hypfs_read_dir,
+.write = hypfs_write_deny,
+.getsize = hypfs_getsize,
+.findentry = hypfs_dir_findentry,
+};
+
+static HYPFS_VARDIR_INIT(cpupool_pooldir, "%u", &cpupool_pooldir_funcs);
+
+static const struct hypfs_entry *cpupool_pooldir_enter(
+const struct hypfs_entry *entry)
+{
+return &cpupool_pooldir.e;
+}

I have found a more generic way to handle entering a dyndir node,
resulting in no need to have cpupool_pooldir_enter() and
cpupool_pooldir_funcs.

This will add some more lines to the previous patch, but less than
saved here.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 11/17] xen/hypfs: add getsize() and findentry() callbacks to hypfs_funcs

2020-12-02 Thread Jürgen Groß

On 02.12.20 16:42, Jan Beulich wrote:

On 01.12.2020 09:21, Juergen Gross wrote:

Add a getsize() function pointer to struct hypfs_funcs for being able
to have dynamically filled entries without the need to take the hypfs
lock each time the contents are being generated.

For directories add a findentry callback to the vector and modify
hypfs_get_entry_rel() to use it.

Signed-off-by: Juergen Gross 


Reviewed-by: Jan Beulich 
with maybe one further small adjustment:


@@ -176,15 +188,41 @@ static int hypfs_get_path_user(char *buf,
  return 0;
  }
  
+struct hypfs_entry *hypfs_leaf_findentry(const struct hypfs_entry_dir *dir,

+ const char *name,
+ unsigned int name_len)
+{
+return ERR_PTR(-ENOENT);
+}


ENOENT seems odd to me here. There looks to be no counterpart to
EISDIR, so maybe ENODATA, EACCES, or EPERM?


Hmm, why?

In case I have /a/b and I'm looking for /a/b/c ENOENT seems to be just
fine?

Or I could add ENOTDIR.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 8/8] xen/arm: Add support for SMMUv3 driver

2020-12-02 Thread Julien Grall

Hi Rahul,

On 26/11/2020 17:02, Rahul Singh wrote:

Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.

Major differences with regard to Linux driver are as follows:
1. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
2. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
3. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.


On the previous version, we discussed that using tasklets is not a 
suitable replacement for threaded IRQs. What's the plan to address it?



4. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Once the proper atomic operations will be available in
XEN the driver can be updated.
5. Driver is currently supported as Tech Preview.

Signed-off-by: Rahul Singh 
---
  MAINTAINERS   |   6 +
  SUPPORT.md|   1 +
  xen/drivers/passthrough/Kconfig   |  10 +
  xen/drivers/passthrough/arm/Makefile  |   1 +
  xen/drivers/passthrough/arm/smmu-v3.c | 986 +-
  5 files changed, 814 insertions(+), 190 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index dab38a6a14..1d63489eec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -249,6 +249,12 @@ F: xen/include/asm-arm/
  F:xen/include/public/arch-arm/
  F:xen/include/public/arch-arm.h
  
+ARM SMMUv3

+M: Bertrand Marquis 
+M: Rahul Singh 
+S: Supported
+F: xen/drivers/passthrough/arm/smmu-v3.c
+
  Change Log
  M:Paul Durrant 
  R:Community Manager 
diff --git a/SUPPORT.md b/SUPPORT.md
index ab02aca5f4..e402c7202d 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -68,6 +68,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
  Status, ARM SMMUv1: Supported, not security supported
  Status, ARM SMMUv2: Supported, not security supported
  Status, Renesas IPMMU-VMSA: Supported, not security supported
+Status, ARM SMMUv3: Tech Preview


Please move this right after "ARM SMMUv2".

  
  ### ARM/GICv3 ITS
  
diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig

index 0036007ec4..5b71c59f47 100644
--- a/xen/drivers/passthrough/Kconfig
+++ b/xen/drivers/passthrough/Kconfig
@@ -13,6 +13,16 @@ config ARM_SMMU
  Say Y here if your SoC includes an IOMMU device implementing the
  ARM SMMU architecture.
  
+config ARM_SMMU_V3

+   bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
+   depends on ARM_64
+   ---help---
+Support for implementations of the ARM System MMU architecture
+version 3.
+
+Say Y here if your system includes an IOMMU device implementing
+the ARM SMMUv3 architecture.
+
  config IPMMU_VMSA
bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
depends on ARM_64
diff --git a/xen/drivers/passthrough/arm/Makefile 
b/xen/drivers/passthrough/arm/Makefile
index fcd918ea3e..c5fb3b58a5 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,3 +1,4 @@
  obj-y += iommu.o iommu_helpers.o iommu_fwspec.o
  obj-$(CONFIG_ARM_SMMU) += smmu.o
  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
+obj-$(CONFIG_ARM_SMMU_V3) += smmu-v3.o
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index 55d1cba194..8f2337e7f2 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -2,36 +2,280 @@
  /*
   * IOMMU API for ARM architected SMMUv3 implementations.
   *
- * Copyright (C) 2015 ARM Limited
+ * Based on Linux's SMMUv3 driver:
+ *drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+ *commit: 951cbbc386ff01b50da4f46387e994e81d9ab431
+ * and Xen's SMMU driver:
+ *xen/drivers/passthrough/arm/smmu.c


I would suggest to list the major differences here as well.


   *
- * Author: Will Deacon 
+ * Copyright (C) 2015 ARM Limited Will Deacon 


Why did you merge the Author and copyright line?


   *
- * This driver is powered by bad coffee and bombay mix.
+ * Copyright (C) 2020 Arm Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see 

Re: [PATCH v2 8/8] xen/arm: Add support for SMMUv3 driver

2020-12-02 Thread Rahul Singh
Hello Stefano,

> On 2 Dec 2020, at 2:51 am, Stefano Stabellini  wrote:
> 
> On Thu, 26 Nov 2020, Rahul Singh wrote:
>> Add support for ARM architected SMMUv3 implementation. It is based on
>> the Linux SMMUv3 driver.
>> 
>> Major differences with regard to Linux driver are as follows:
>> 1. Only Stage-2 translation is supported as compared to the Linux driver
>>   that supports both Stage-1 and Stage-2 translations.
>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>   capability to share the page tables with the CPU.
>> 3. Tasklets are used in place of threaded IRQ's in Linux for event queue
>>   and priority queue IRQ handling.
>> 4. Latest version of the Linux SMMUv3 code implements the commands queue
>>   access functions based on atomic operations implemented in Linux.
>>   Atomic functions used by the commands queue access functions are not
>>   implemented in XEN therefore we decided to port the earlier version
>>   of the code. Once the proper atomic operations will be available in
>>   XEN the driver can be updated.
>> 5. Driver is currently supported as Tech Preview.
> 
> This patch is big and was difficult to review, nonetheless I tried :-)

Thanks again for reviewing the code.
> 
> That said, the code is self-contained, marked as Tech Preview, and not
> at risk of making other things unstable, so it is low risk to accept it.
> 
> Comments below.
> 
> 
>> Signed-off-by: Rahul Singh 
>> ---
>> MAINTAINERS   |   6 +
>> SUPPORT.md|   1 +
>> xen/drivers/passthrough/Kconfig   |  10 +
>> xen/drivers/passthrough/arm/Makefile  |   1 +
>> xen/drivers/passthrough/arm/smmu-v3.c | 986 +-
>> 5 files changed, 814 insertions(+), 190 deletions(-)
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index dab38a6a14..1d63489eec 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -249,6 +249,12 @@ F:  xen/include/asm-arm/
>> F:   xen/include/public/arch-arm/
>> F:   xen/include/public/arch-arm.h
>> 
>> +ARM SMMUv3
>> +M:  Bertrand Marquis 
>> +M:  Rahul Singh 
>> +S:  Supported
>> +F:  xen/drivers/passthrough/arm/smmu-v3.c
>> +
>> Change Log
>> M:   Paul Durrant 
>> R:   Community Manager 
>> diff --git a/SUPPORT.md b/SUPPORT.md
>> index ab02aca5f4..e402c7202d 100644
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -68,6 +68,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
>> Status, ARM SMMUv1: Supported, not security supported
>> Status, ARM SMMUv2: Supported, not security supported
>> Status, Renesas IPMMU-VMSA: Supported, not security supported
>> +Status, ARM SMMUv3: Tech Preview
>> 
>> ### ARM/GICv3 ITS
>> 
>> diff --git a/xen/drivers/passthrough/Kconfig 
>> b/xen/drivers/passthrough/Kconfig
>> index 0036007ec4..5b71c59f47 100644
>> --- a/xen/drivers/passthrough/Kconfig
>> +++ b/xen/drivers/passthrough/Kconfig
>> @@ -13,6 +13,16 @@ config ARM_SMMU
>>Say Y here if your SoC includes an IOMMU device implementing the
>>ARM SMMU architecture.
>> 
>> +config ARM_SMMU_V3
>> +bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
>> +depends on ARM_64
>> +---help---
>> + Support for implementations of the ARM System MMU architecture
>> + version 3.
>> +
>> + Say Y here if your system includes an IOMMU device implementing
>> + the ARM SMMUv3 architecture.
>> +
>> config IPMMU_VMSA
>>  bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
>>  depends on ARM_64
>> diff --git a/xen/drivers/passthrough/arm/Makefile 
>> b/xen/drivers/passthrough/arm/Makefile
>> index fcd918ea3e..c5fb3b58a5 100644
>> --- a/xen/drivers/passthrough/arm/Makefile
>> +++ b/xen/drivers/passthrough/arm/Makefile
>> @@ -1,3 +1,4 @@
>> obj-y += iommu.o iommu_helpers.o iommu_fwspec.o
>> obj-$(CONFIG_ARM_SMMU) += smmu.o
>> obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>> +obj-$(CONFIG_ARM_SMMU_V3) += smmu-v3.o
>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>> b/xen/drivers/passthrough/arm/smmu-v3.c
>> index 55d1cba194..8f2337e7f2 100644
>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>> @@ -2,36 +2,280 @@
>> /*
>>  * IOMMU API for ARM architected SMMUv3 implementations.
>>  *
>> - * Copyright (C) 2015 ARM Limited
>> + * Based on Linux's SMMUv3 driver:
>> + *drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> + *commit: 951cbbc386ff01b50da4f46387e994e81d9ab431
>> + * and Xen's SMMU driver:
>> + *xen/drivers/passthrough/arm/smmu.c
>>  *
>> - * Author: Will Deacon 
>> + * Copyright (C) 2015 ARM Limited Will Deacon 
>>  *
>> - * This driver is powered by bad coffee and bombay mix.
>> + * Copyright (C) 2020 Arm Ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRA

Re: [PATCH] x86/IRQ: bump max number of guests for a shared IRQ to 31

2020-12-02 Thread Igor Druzhinin
On 02/12/2020 15:21, Jan Beulich wrote:
> On 02.12.2020 15:53, Igor Druzhinin wrote:
>> On 02/12/2020 09:25, Jan Beulich wrote:
>>> Instead I'm wondering whether this wouldn't better be a Kconfig
>>> setting (or even command line controllable). There don't look to be
>>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>>> the case for old and new values here, for whatever reason), so a
>>> simple permitted range of like 4...64 would seem fine to specify.
>>> Whether the default then would want to be 8 (close to the current
>>> 7) or higher (around the actually observed maximum) is a different
>>> question.
>>
>> I'm in favor of a command line argument here - it would be much less trouble
>> if a higher limit was suddenly necessary in the field. The default IMO
>> should definitely be higher than 8 - I'd stick with number 32 which to me
>> should cover our real world scenarios and apply some headroom for the future.
> 
> Well, I'm concerned of the extra memory overhead. Every IRQ,
> sharable or not, will get the extra slots allocated with the
> current scheme. Perhaps a prereq change then would be to only
> allocate multi-guest arrays for sharable IRQs, effectively
> shrinking the overhead in particular for all MSI ones?

That's one way to improve overall system scalability but in that area
there is certainly much bigger fish to fry elsewhere. With 32 elements in the
array we get 200 bytes of overhead per structure, with 16 it's just 72 extra
bytes which in the unattainable worst case scenario of every single vector taken
in 512 CPU machine would only account for several MB of overhead.

I'd start with dynamic array allocation first and setting the limit to 16 that
should be enough for now. And then if that default value needs to be raised
we can consider further improvements.

Igor



Re: Xen 4.15: Proposed release schedule

2020-12-02 Thread Ian Jackson
Julien Grall writes ("Re: Xen 4.15: Proposed release schedule"):
> Therefore, would it be possible to push the "Feature Freeze" by week?

Sure.  To expand on that: this was the only comment on my proposal,
and I'm happy to accept such a minor change.

Adjusting the LPD too, but not the other two tentative dates, leads to
this schedule:

   Friday 15th JanuaryLast posting date

 Patches adding new features should be posted to the mailing list
 by this cate, although perhaps not in their final version.

   Friday 29th January   Feature freeze
  
 Patches adding new features should be committed by this date.
 Straightforward bugfixes may continue to be accepted by
 maintainers.

   Friday 12th February **tentatve**   Code freeze

 Bugfixes only, all changes to be approved by the Release Manager.

   Week of 12th March **tentative**   Release
 (probably Tuesday or Wednesday)

Unless anyone has objections, I will declare this official, tomorrow.

Ian.



[PATCH v1] tools/hotplug: allow tuning of xenwatchdogd arguments

2020-12-02 Thread Olaf Hering
Currently the arguments for xenwatchdogd are hardcoded with 15s
keep-alive interval and 30s timeout.

It is not possible to tweak these values via
/etc/systemd/system/xen-watchdog.service.d/*.conf because ExecStart
can not be replaced. The only option would be a private copy
/etc/systemd/system/xen-watchdog.service, which may get out of sync
with the Xen provided xen-watchdog.service.

Adjust the service file to recognize XENWATCHDOGD_ARGS= in a
private unit configuration file.

Signed-off-by: Olaf Hering 
---
 tools/hotplug/Linux/init.d/xen-watchdog.in  | 7 ++-
 tools/hotplug/Linux/systemd/xen-watchdog.service.in | 4 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/hotplug/Linux/init.d/xen-watchdog.in 
b/tools/hotplug/Linux/init.d/xen-watchdog.in
index c05f1f6b6a..87e2353b49 100644
--- a/tools/hotplug/Linux/init.d/xen-watchdog.in
+++ b/tools/hotplug/Linux/init.d/xen-watchdog.in
@@ -19,6 +19,11 @@
 
 . @XEN_SCRIPT_DIR@/hotplugpath.sh
 
+xencommons_config=@CONFIG_DIR@/@CONFIG_LEAF_DIR@
+
+test -f $xencommons_config/xencommons && . $xencommons_config/xencommons
+
+test -z "$XENWATCHDOGD_ARGS" || XENWATCHDOGD_ARGS='15 30'
 DAEMON=${sbindir}/xenwatchdogd
 base=$(basename $DAEMON)
 
@@ -46,7 +51,7 @@ start() {
local r
echo -n $"Starting domain watchdog daemon: "
 
-   $DAEMON 30 15
+   $DAEMON $XENWATCHDOGD_ARGS
r=$?
[ "$r" -eq 0 ] && success $"$base startup" || failure $"$base startup"
echo
diff --git a/tools/hotplug/Linux/systemd/xen-watchdog.service.in 
b/tools/hotplug/Linux/systemd/xen-watchdog.service.in
index 1eecd2a616..637ab7fd7f 100644
--- a/tools/hotplug/Linux/systemd/xen-watchdog.service.in
+++ b/tools/hotplug/Linux/systemd/xen-watchdog.service.in
@@ -6,7 +6,9 @@ ConditionPathExists=/proc/xen/capabilities
 
 [Service]
 Type=forking
-ExecStart=@sbindir@/xenwatchdogd 30 15
+Environment="XENWATCHDOGD_ARGS=30 15"
+EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons
+ExecStart=@sbindir@/xenwatchdogd $XENWATCHDOGD_ARGS
 KillSignal=USR1
 
 [Install]



Re: [PATCH v2 8/8] xen/arm: Add support for SMMUv3 driver

2020-12-02 Thread Julien Grall

Hi Stefano,

On 02/12/2020 02:51, Stefano Stabellini wrote:

On Thu, 26 Nov 2020, Rahul Singh wrote:



+/* Alias to Xen device tree helpers */
+#define device_node dt_device_node
+#define of_phandle_args dt_phandle_args
+#define of_device_id dt_device_match
+#define of_match_node dt_match_node
+#define of_property_read_u32(np, pname, out) (!dt_property_read_u32(np, pname, 
out))
+#define of_property_read_bool dt_property_read_bool
+#define of_parse_phandle_with_args dt_parse_phandle_with_args


Given all the changes to the file by the previous patches we are
basically fully (or almost fully) adapting this code to Xen.

So at that point I wonder if we should just as well make these changes
(e.g. s/of_phandle_args/dt_phandle_args/g) to the code too.


I have already accepted the fact that keeping Linux code as-is is nearly 
impossible without much workaround :). The benefits tends to also 
limited as we noticed for the SMMU driver.


I would like to point out that this may make quite difficult (if not 
impossible) to revert the previous patches which remove support for some 
features (e.g. atomic, MSI, ATS).


If we are going to adapt the code to Xen (I'd like to keep Linux code 
style though), then I think we should consider to keep code that may be 
useful in the near future (at least MSI, ATS).





+#define FIELD_GET(_mask, _reg)  \
+(typeof(_mask))(((_reg) & (_mask)) >> (__builtin_ffsll(_mask) - 1))
+
+#define WRITE_ONCE(x, val)  \
+do {\
+*(volatile typeof(x) *)&(x) = (val);\
+} while (0)


maybe we should define this in xen/include/xen/lib.h


I have attempted such discussion in the past and this resulted to more 
bikeshed than it is worth it. So I would suggest to re-implement 
WRITE_ONCE() as write_atomic() for now.


Cheers,

--
Julien Grall



Re: [PATCH v2] xen/irq: Propagate the error from init_one_desc_irq() in init_*_irq_data()

2020-12-02 Thread Bertrand Marquis


> On 28 Nov 2020, at 11:36, Julien Grall  wrote:
> 
> From: Julien Grall 
> 
> init_one_desc_irq() can return an error if it is unable to allocate
> memory. While this is unlikely to happen during boot (called from
> init_{,local_}irq_data()), it is better to harden the code by
> propagting the return value.
> 
> Spotted by coverity.
> 
> CID: 106529
> 
> Signed-off-by: Julien Grall 
> Reviewed-by: Roger Paul Monné 

Reviewed-by: Bertrand Marquis 

Cheers
Bertrand

> 
> ---
>Changes in v2:
>- Add Roger's reviewed-by for x86
>- Handle
> ---
> xen/arch/arm/irq.c | 12 ++--
> xen/arch/x86/irq.c |  7 ++-
> 2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 3877657a5277..b71b099e6fa2 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -88,7 +88,11 @@ static int __init init_irq_data(void)
> for ( irq = NR_LOCAL_IRQS; irq < NR_IRQS; irq++ )
> {
> struct irq_desc *desc = irq_to_desc(irq);
> -init_one_irq_desc(desc);
> +int rc = init_one_irq_desc(desc);
> +
> +if ( rc )
> +return rc;
> +
> desc->irq = irq;
> desc->action  = NULL;
> }
> @@ -105,7 +109,11 @@ static int init_local_irq_data(void)
> for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
> {
> struct irq_desc *desc = irq_to_desc(irq);
> -init_one_irq_desc(desc);
> +int rc = init_one_irq_desc(desc);
> +
> +if ( rc )
> +return rc;
> +
> desc->irq = irq;
> desc->action  = NULL;
> 
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 45966947919e..3ebd684415ac 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -428,9 +428,14 @@ int __init init_irq_data(void)
> 
> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
> {
> +int rc;
> +
> desc = irq_to_desc(irq);
> desc->irq = irq;
> -init_one_irq_desc(desc);
> +
> +rc = init_one_irq_desc(desc);
> +if ( rc )
> +return rc;
> }
> for ( ; irq < nr_irqs; irq++ )
> irq_to_desc(irq)->irq = irq;
> -- 
> 2.17.1
> 
> 



Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-02 Thread Borislav Petkov
On Wed, Dec 02, 2020 at 03:48:21PM +0100, Jürgen Groß wrote:
> I wanted to avoid the additional NOPs for the bare metal case.

Yeah, in that case it gets optimized to a single NOP:

[0.176692] SMP alternatives: 81a00068: [0:5) optimized NOPs: 0f 1f 
44 00 00

which is nopl 0x0(%rax,%rax,1) and I don't think that's noticeable on
modern CPUs where a NOP is basically a rIP increment only and that goes
down the pipe almost for free. :-)

> If you don't mind them I can do as you are suggesting.

Yes pls, I think asm readability is more important than a 5-byte NOP.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 1/2] include: don't use asm/page.h from common headers

2020-12-02 Thread Julien Grall

Hi Jan,

On 02/12/2020 14:49, Jan Beulich wrote:

Doing so limits what can be done in (in particular included by) this per-
arch header. Abstract out page shift/size related #define-s, which is all
the repsecitve headers care about. Extend the replacement / removal to


s/repsecitve/respective/


some x86 headers as well; some others now need to include page.h (and
they really should have before).

Arm's VADDR_BITS gets restricted to 32-bit, as its current value is
clearly wrong for 64-bit, but the constant also isn't used anywhere
right now (i.e. the #define could also be dropped altogether).


Whoops. Thankfully this is not used.



I wasn't sure about Arm's use of vaddr_t in PAGE_OFFSET(), and hence I
kept it and provided a way to override the #define in the common header.


vaddr_t is defined to 32-bit for arm32 or 64-bit for arm64. So I think 
it would be fine to use the generic PAGE_OFFSET() implementation.




Also drop the dead PAGE_FLAG_MASK altogether at this occasion.

Signed-off-by: Jan Beulich 

--- a/xen/arch/arm/arm64/lib/clear_page.S
+++ b/xen/arch/arm/arm64/lib/clear_page.S
@@ -14,6 +14,8 @@
   * along with this program.  If not, see .
   */
  
+#include 

+
  /*
   * Clear page @dest
   *
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -176,11 +176,6 @@
  #define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
  #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  
/* End mappings of ACPI tables */
  
-#define PAGE_SHIFT  12

-#define PAGE_SIZE   (_AC(1,L) << PAGE_SHIFT)
-#define PAGE_MASK   (~(PAGE_SIZE-1))
-#define PAGE_FLAG_MASK  (~0)
-
  #define NR_hypercalls 64
  
  #define STACK_ORDER 3

--- a/xen/include/asm-arm/current.h
+++ b/xen/include/asm-arm/current.h
@@ -1,6 +1,7 @@
  #ifndef __ARM_CURRENT_H__
  #define __ARM_CURRENT_H__
  
+#include 

  #include 
  
  #include 

--- /dev/null
+++ b/xen/include/asm-arm/page-shift.h


The name of the file looks a bit odd given that *_BITS are also defined 
in it. So how about renaming to page-size.h?



@@ -0,0 +1,15 @@
+#ifndef __ARM_PAGE_SHIFT_H__
+#define __ARM_PAGE_SHIFT_H__
+
+#define PAGE_SHIFT  12
+
+#define PAGE_OFFSET(ptr)((vaddr_t)(ptr) & ~PAGE_MASK)
+
+#ifdef CONFIG_ARM_64
+#define PADDR_BITS  48


Shouldn't we define VADDR_BITS here? But I wonder whether VADDR_BITS 
should be defined as sizeof(vaddr_t) * 8.


This would require to include asm/types.h.


+#else
+#define PADDR_BITS  40
+#define VADDR_BITS  32
+#endif
+
+#endif /* __ARM_PAGE_SHIFT_H__ */
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -2,21 +2,11 @@
  #define __ARM_PAGE_H__
  
  #include 

+#include 
  #include 
  #include 
  #include 
  
-#ifdef CONFIG_ARM_64

-#define PADDR_BITS  48
-#else
-#define PADDR_BITS  40
-#endif
-#define PADDR_MASK  ((1ULL << PADDR_BITS)-1)
-#define PAGE_OFFSET(ptr)((vaddr_t)(ptr) & ~PAGE_MASK)
-
-#define VADDR_BITS  32
-#define VADDR_MASK  (~0UL)
-
  /* Shareability values for the LPAE entries */
  #define LPAE_SH_NON_SHAREABLE 0x0
  #define LPAE_SH_UNPREDICTALE  0x1
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -8,8 +8,8 @@
  #define __X86_CURRENT_H__
  
  #include 

+#include 
  #include 
-#include 
  
  /*

   * Xen's cpu stacks are 8 pages (8-page aligned), arranged as:
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -1,6 +1,8 @@
  #ifndef __ARCH_DESC_H
  #define __ARCH_DESC_H
  
+#include 


May I ask why you are including  and not  here?

Cheers,

--
Julien Grall



Re: [PATCH 2/2] mm: split out mfn_t / gfn_t / pfn_t definitions and helpers

2020-12-02 Thread Julien Grall

Hi Jan,

On 02/12/2020 14:50, Jan Beulich wrote:

xen/mm.h has heavy dependencies, while in a number of cases only these
type definitions are needed. This separation then also allows pulling in
these definitions when including xen/mm.h would cause cyclic
dependencies.

Replace xen/mm.h inclusion where possible in include/xen/. (In
xen/iommu.h also take the opportunity and correct the few remaining
sorting issues.)

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -10,7 +10,6 @@
   * Slimmed with Xen specific support.
   */
  
-#include 


This seems to be unrelated of this work.


  #include 
  #include 
  #include 
--- a/xen/drivers/char/meson-uart.c
+++ b/xen/drivers/char/meson-uart.c
@@ -18,7 +18,9 @@
   * License along with this program; If not, see 
.
   */
  
+#include 

  #include 
+#include 


I was going to ask why xen/mm.h needs to be included here. But it looks 
like ioremap_nocache() is defined in mm.h rather than vmap.h.


I will add it as a clean-up on my side.



  #include 
  #include 
  #include 
--- a/xen/drivers/char/mvebu-uart.c
+++ b/xen/drivers/char/mvebu-uart.c
@@ -18,7 +18,9 @@
   * License along with this program; If not, see 
.
   */
  
+#include 

  #include 
+#include 
  #include 
  #include 
  #include 
--- a/xen/include/asm-x86/io.h
+++ b/xen/include/asm-x86/io.h
@@ -49,6 +49,7 @@ __OUT(l,,int)
  
  /* Function pointer used to handle platform specific I/O port emulation. */

  #define IOEMUL_QUIRK_STUB_BYTES 9
+struct cpu_user_regs;
  extern unsigned int (*ioemul_handle_quirk)(
  u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs);
  
--- /dev/null

+++ b/xen/include/xen/frame-num.h


It would feel more natural to me if the file is named mm-types.h.

Cheers,

--
Julien Grall



Re: [PATCH] xen/iommu: vtd: Fix undefined behavior pci_vtd_quirks()

2020-12-02 Thread Julien Grall

Hi,

On 30/11/2020 02:50, Tian, Kevin wrote:

From: Julien Grall 
Sent: Thursday, November 19, 2020 10:52 PM

From: Julien Grall 

When booting Xen with CONFIG_USBAN=y on Sandy Bridge, UBSAN will
throw
the following splat:

(XEN)


(XEN) UBSAN: Undefined behaviour in quirks.c:449:63
(XEN) left shift of 1 by 31 places cannot be represented in type 'int'
(XEN) [ Xen-4.11.4  x86_64  debug=y   Not tainted ]

[...]

(XEN) Xen call trace:
(XEN)[] ubsan.c#ubsan_epilogue+0xa/0xad
(XEN)[]
__ubsan_handle_shift_out_of_bounds+0xb4/0x145
(XEN)[] pci_vtd_quirk+0x3d3/0x74f
(XEN)[]
iommu.c#domain_context_mapping+0x45b/0x46f
(XEN)[] iommu.c#setup_hwdom_device+0x22/0x3a
(XEN)[] pci.c#setup_one_hwdom_device+0x8c/0x124
(XEN)[] pci.c#_setup_hwdom_pci_devices+0xbb/0x2f7
(XEN)[] pci.c#pci_segments_iterate+0x4c/0x8c
(XEN)[] setup_hwdom_pci_devices+0x25/0x2c
(XEN)[]
iommu.c#intel_iommu_hwdom_init+0x52/0x2f3
(XEN)[] iommu_hwdom_init+0x4e/0xa4
(XEN)[] dom0_construct_pv+0x23c8/0x2476
(XEN)[] construct_dom0+0x6c/0xa3
(XEN)[] __start_xen+0x4651/0x4b55
(XEN)[] __high_start+0x53/0x55

Note that splat is from 4.11.4 and not staging. Although, the problem is
still present.

This can be solved by making the first operand unsigned int.

Signed-off-by: Julien Grall 


Reviewed-by: Kevin Tian 


Thanks! I have committed the patch.

Cheers,

--
Julien Grall



Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround

2020-12-02 Thread Julien Grall




On 26/11/2020 11:27, Wei Chen wrote:

Hi Julien,


Hi Wei,


-Original Message-
From: Julien Grall 
Sent: 2020年11月26日 18:55
To: Wei Chen ; Stefano Stabellini 
Cc: Penny Zheng ; xen-devel@lists.xenproject.org;
Andre Przywara ; Bertrand Marquis
; Kaly Xin ; nd

Subject: Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround

Hi Wei,

Your e-mail font seems to be different to the usual plain text one. Are
you sending the e-mail using HTML by any chance?



It's strange, I always use the plain text.


Maybe exchange decided to mangle the e-mail :). Anyway, this new message 
looks fine.





On 26/11/2020 02:07, Wei Chen wrote:

Hi Stefano,


-Original Message-
From: Stefano Stabellini 
Sent: 2020��11��26�� 8:00
To: Wei Chen 
Cc: Julien Grall ; Penny Zheng ;

xen-

de...@lists.xenproject.org; sstabell...@kernel.org; Andre Przywara
; Bertrand Marquis

;

Kaly Xin ; nd 
Subject: RE: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround

Resuming this old thread.

On Fri, 13 Nov 2020, Wei Chen wrote:

Hi,

On 09/11/2020 08:21, Penny Zheng wrote:

CNTVCT_EL0 or CNTPCT_EL0 counter read in Cortex-A73 (all versions)
might return a wrong value when the counter crosses a 32bit boundary.

Until now, there is no case for Xen itself to access CNTVCT_EL0,
and it also should be the Guest OS's responsibility to deal with
this part.

But for CNTPCT, there exists several cases in Xen involving reading
CNTPCT, so a possible workaround is that performing the read twice,
and to return one or the other depending on whether a transition has
taken place.

Signed-off-by: Penny Zheng 


Acked-by: Julien Grall 

On a related topic, do we need a fix similar to Linux commit
75a19a0202db "arm64: arch_timer: Ensure counter register reads occur
with seqlock held"?



I take a look at this Linux commit, it seems to prevent the seq-lock to be
speculated.  Using an enforce ordering instead of ISB after the read counter
operation seems to be for performance reasons.

I have found that you had placed an ISB before read counter in get_cycles
to prevent counter value to be speculated. But you haven't placed the

second

ISB after reading. Is it because we haven't used the get_cycles in seq lock
critical context (Maybe I didn't find the right place)? So should we need to

fix it

now, or you prefer to fix it now for future usage?


Looking at the call sites, it doesn't look like we need any ISB after
get_cycles as it is not used in any critical context. There is also a
data dependency with the value returned by it.


I am assuming you looked at all the users of NOW(). Is that right?



So I am thinking we don't need any fix. At most we need an in-code comment?


I agree with you to add an in-code comment. It will remind us in future when

we

use the get_cycles in critical context. Adding it now will probably only lead to
meaningless performance degradation.


I read this as there would be no perfomance impact if we add the
ordering it now. Did you intend to say?


Sorry about my English. I intended to say "Adding it now may introduce some
performance cost. And this performance cost may be not worth. Because Xen
may never use it in a similar scenario "


Don't worry! I think the performance should not be noticeable if we use 
the same trick as Linux.



In addition, AFAICT, the x86 version of get_cycles() is already able to
provide that ordering. So there are chances that code may rely on it.

While I don't necessarily agree to add barriers everywhere by default
(this may have big impact on the platform). I think it is better to have
an accurate number of cycles.



As x86 had done it, I think it’s ok to do it for Arm. This will keep a function
behaves the same on different architectures.


Just to be clear, I am not 100% sure this is what Intel is doing. 
Although this is my understanding of the comment in the code.


@Stefano, what do you think?

@Wei, assuming Stefano is happy with the proposal, would you be happy to 
send a patch for that?


Best regards,

--
Julien Grall



[qemu-mainline test] 157142: regressions - FAIL

2020-12-02 Thread osstest service owner
flight 157142 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157142/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631
 build-arm64-pvops 6 kernel-build fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeatfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 qemuud73c46e4a84e47ffc61b8bf7c378b1383e7316b5
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  104 days
Failing since152659  2020-08-21 14:07:39 Z  103 days  215 attempts
Testing same since   157142  2020-12-01 20:39:57 Z0 days1 attempts


People who touched revisions under test:
Aaron Lindsay 
  Alberto Garcia 
  Aleksandar Markovic 
  Alex Bennée 
  Alex Chen 
  Alex Williamson 
  Alexander Bulekov 
  Alexander von Gluck IV 
  AlexChen 
  Alexey Kirillov 
  Alistair Francis 
  Alistair Francis 
  Amey Narkhede 
  Ana Pazos 
  Andreas Gustafsson 
  Andrew Jones 
  Andrey Konovalov 
  Andrey Shinkevich 
  Ani Sinha 
  Anthony PERARD 
  Anton Blanchard 
  Anup Patel 
  Artyom Tarasenko 
  Babu Moger 
  BALATON Zoltan 
  Bastian Koppelmann 
  Ben Widawsky 
  Bharat Bhushan 
  Bihong Yu 
  Bin Meng 
  Brad Smith 
  Bruce Rogers 
  Carlo Marcelo Arenas Belón 
  Chen Gang 
  Chen Qun 
  Chetan Pant 
  Chih-Min Chao 
  Christian Bor

Re: [PATCH] gnttab: don't allocate status frame tracking array when "gnttab=max_ver:1"

2020-12-02 Thread Julien Grall

Hi Jan,

On 05/11/2020 15:55, Jan Beulich wrote:

This array can be large when many grant frames are permitted; avoid
allocating it when it's not going to be used anyway. 


Given there are not many users of grant-table v2, would it make sense to 
avoid allocating the array until the guest start using grant-table v2?


Cheers,

--
Julien Grall



Re: [PATCH v3 1/5] evtchn: drop acquiring of per-channel lock from send_guest_{global,vcpu}_virq()

2020-12-02 Thread Julien Grall

Hi Jan,

On 23/11/2020 13:28, Jan Beulich wrote:

The per-vCPU virq_lock, which is being held anyway, together with there
not being any call to evtchn_port_set_pending() when v->virq_to_evtchn[]
is zero, provide sufficient guarantees. 


I agree that the per-vCPU virq_lock is going to be sufficient, however 
dropping the lock also means the event channel locking is more complex 
to understand (the long comment that was added proves it).


In fact, the locking in the event channel code was already proven to be 
quite fragile, therefore I think this patch is not worth the risk.


Cheers,

--
Julien Grall



[linux-linus test] 157143: regressions - FAIL

2020-12-02 Thread osstest service owner
flight 157143 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157143/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl  broken  in 157131
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl  12 debian-install fail in 157109 REGR. vs. 152332
 test-arm64-arm64-xl-credit1 10 host-ping-check-xen fail in 157119 REGR. vs. 
152332
 test-arm64-arm64-libvirt-xsm 10 host-ping-check-xen fail in 157119 REGR. vs. 
152332
 test-arm64-arm64-xl-xsm  12 debian-install fail in 157119 REGR. vs. 152332
 test-arm64-arm64-examine 13 examine-iommu  fail in 157119 REGR. vs. 152332

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-seattle 10 host-ping-check-xen fail in 157119 pass in 
157131
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
157119 pass in 157143
 test-armhf-armhf-libvirt  8 xen-boot fail in 157131 pass in 157143
 test-arm64-arm64-xl-xsm   8 xen-boot fail in 157131 pass in 157143
 test-arm64-arm64-xl   8 xen-boot   fail pass in 157109
 test-arm64-arm64-xl-xsm  10 host-ping-check-xenfail pass in 157119
 test-arm64-arm64-xl-credit1   8 xen-boot   fail pass in 157119
 test-arm64-arm64-libvirt-xsm  8 xen-boot   fail pass in 157119
 test-armhf-armhf-libvirt 18 guest-start/debian.repeat  fail pass in 157119
 test-arm64-arm64-examine  8 reboot fail pass in 157131
 test-arm64-arm64-xl-seattle   8 xen-boot   fail pass in 157131

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl   5 host-install(5) broken in 157131 blocked in 152332
 test-arm64-arm64-xl-xsm 11 leak-check/basis(11) fail in 157109 blocked in 
152332
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 157119 like 
152332
 test-arm64-arm64-xl-seattle 11 leak-check/basis(11) fail in 157131 blocked in 
152332
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-check  

Re: [PATCH v2 8/8] xen/arm: Add support for SMMUv3 driver

2020-12-02 Thread Rahul Singh
Hi Stefano,

> On 2 Dec 2020, at 4:27 pm, Rahul Singh  wrote:
> 
> Hello Stefano,
> 
>> On 2 Dec 2020, at 2:51 am, Stefano Stabellini  wrote:
>> 
>> On Thu, 26 Nov 2020, Rahul Singh wrote:
>>> Add support for ARM architected SMMUv3 implementation. It is based on
>>> the Linux SMMUv3 driver.
>>> 
>>> Major differences with regard to Linux driver are as follows:
>>> 1. Only Stage-2 translation is supported as compared to the Linux driver
>>>  that supports both Stage-1 and Stage-2 translations.
>>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>>  capability to share the page tables with the CPU.
>>> 3. Tasklets are used in place of threaded IRQ's in Linux for event queue
>>>  and priority queue IRQ handling.
>>> 4. Latest version of the Linux SMMUv3 code implements the commands queue
>>>  access functions based on atomic operations implemented in Linux.
>>>  Atomic functions used by the commands queue access functions are not
>>>  implemented in XEN therefore we decided to port the earlier version
>>>  of the code. Once the proper atomic operations will be available in
>>>  XEN the driver can be updated.
>>> 5. Driver is currently supported as Tech Preview.
>> 
>> This patch is big and was difficult to review, nonetheless I tried :-)
> 
> Thanks again for reviewing the code.
>> 
>> That said, the code is self-contained, marked as Tech Preview, and not
>> at risk of making other things unstable, so it is low risk to accept it.
>> 
>> Comments below.
>> 
>> 
>>> Signed-off-by: Rahul Singh 
>>> ---
>>> MAINTAINERS   |   6 +
>>> SUPPORT.md|   1 +
>>> xen/drivers/passthrough/Kconfig   |  10 +
>>> xen/drivers/passthrough/arm/Makefile  |   1 +
>>> xen/drivers/passthrough/arm/smmu-v3.c | 986 +-
>>> 5 files changed, 814 insertions(+), 190 deletions(-)
>>> 
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index dab38a6a14..1d63489eec 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -249,6 +249,12 @@ F: xen/include/asm-arm/
>>> F:  xen/include/public/arch-arm/
>>> F:  xen/include/public/arch-arm.h
>>> 
>>> +ARM SMMUv3
>>> +M: Bertrand Marquis 
>>> +M: Rahul Singh 
>>> +S: Supported
>>> +F: xen/drivers/passthrough/arm/smmu-v3.c
>>> +
>>> Change Log
>>> M:  Paul Durrant 
>>> R:  Community Manager 
>>> diff --git a/SUPPORT.md b/SUPPORT.md
>>> index ab02aca5f4..e402c7202d 100644
>>> --- a/SUPPORT.md
>>> +++ b/SUPPORT.md
>>> @@ -68,6 +68,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
>>>Status, ARM SMMUv1: Supported, not security supported
>>>Status, ARM SMMUv2: Supported, not security supported
>>>Status, Renesas IPMMU-VMSA: Supported, not security supported
>>> +Status, ARM SMMUv3: Tech Preview
>>> 
>>> ### ARM/GICv3 ITS
>>> 
>>> diff --git a/xen/drivers/passthrough/Kconfig 
>>> b/xen/drivers/passthrough/Kconfig
>>> index 0036007ec4..5b71c59f47 100644
>>> --- a/xen/drivers/passthrough/Kconfig
>>> +++ b/xen/drivers/passthrough/Kconfig
>>> @@ -13,6 +13,16 @@ config ARM_SMMU
>>>   Say Y here if your SoC includes an IOMMU device implementing the
>>>   ARM SMMU architecture.
>>> 
>>> +config ARM_SMMU_V3
>>> +   bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
>>> +   depends on ARM_64
>>> +   ---help---
>>> +Support for implementations of the ARM System MMU architecture
>>> +version 3.
>>> +
>>> +Say Y here if your system includes an IOMMU device implementing
>>> +the ARM SMMUv3 architecture.
>>> +
>>> config IPMMU_VMSA
>>> bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
>>> depends on ARM_64
>>> diff --git a/xen/drivers/passthrough/arm/Makefile 
>>> b/xen/drivers/passthrough/arm/Makefile
>>> index fcd918ea3e..c5fb3b58a5 100644
>>> --- a/xen/drivers/passthrough/arm/Makefile
>>> +++ b/xen/drivers/passthrough/arm/Makefile
>>> @@ -1,3 +1,4 @@
>>> obj-y += iommu.o iommu_helpers.o iommu_fwspec.o
>>> obj-$(CONFIG_ARM_SMMU) += smmu.o
>>> obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>>> +obj-$(CONFIG_ARM_SMMU_V3) += smmu-v3.o
>>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
>>> b/xen/drivers/passthrough/arm/smmu-v3.c
>>> index 55d1cba194..8f2337e7f2 100644
>>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>>> @@ -2,36 +2,280 @@
>>> /*
>>> * IOMMU API for ARM architected SMMUv3 implementations.
>>> *
>>> - * Copyright (C) 2015 ARM Limited
>>> + * Based on Linux's SMMUv3 driver:
>>> + *drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> + *commit: 951cbbc386ff01b50da4f46387e994e81d9ab431
>>> + * and Xen's SMMU driver:
>>> + *xen/drivers/passthrough/arm/smmu.c
>>> *
>>> - * Author: Will Deacon 
>>> + * Copyright (C) 2015 ARM Limited Will Deacon 
>>> *
>>> - * This driver is powered by bad coffee and bombay mix.
>>> + * Copyright (C) 2020 Arm Ltd.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 

Re: [PATCH v1] tools/hotplug: allow tuning of xenwatchdogd arguments

2020-12-02 Thread Jason Andryuk
On Wed, Dec 2, 2020 at 11:47 AM Olaf Hering  wrote:
>
> Currently the arguments for xenwatchdogd are hardcoded with 15s
> keep-alive interval and 30s timeout.
>
> It is not possible to tweak these values via
> /etc/systemd/system/xen-watchdog.service.d/*.conf because ExecStart
> can not be replaced. The only option would be a private copy
> /etc/systemd/system/xen-watchdog.service, which may get out of sync
> with the Xen provided xen-watchdog.service.
>
> Adjust the service file to recognize XENWATCHDOGD_ARGS= in a
> private unit configuration file.
>
> Signed-off-by: Olaf Hering 
> ---
>  tools/hotplug/Linux/init.d/xen-watchdog.in  | 7 ++-
>  tools/hotplug/Linux/systemd/xen-watchdog.service.in | 4 +++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tools/hotplug/Linux/init.d/xen-watchdog.in 
> b/tools/hotplug/Linux/init.d/xen-watchdog.in
> index c05f1f6b6a..87e2353b49 100644
> --- a/tools/hotplug/Linux/init.d/xen-watchdog.in
> +++ b/tools/hotplug/Linux/init.d/xen-watchdog.in
> @@ -19,6 +19,11 @@
>
>  . @XEN_SCRIPT_DIR@/hotplugpath.sh
>
> +xencommons_config=@CONFIG_DIR@/@CONFIG_LEAF_DIR@
> +
> +test -f $xencommons_config/xencommons && . $xencommons_config/xencommons
> +
> +test -z "$XENWATCHDOGD_ARGS" || XENWATCHDOGD_ARGS='15 30'

This should be `test -z ... && ` or `test -n ... || ` to set the
default values properly.

Regards,
Jason



Re: [PATCH v3 5/5] evtchn: don't call Xen consumer callback with per-channel lock held

2020-12-02 Thread Julien Grall

Hi Jan,

On 23/11/2020 13:30, Jan Beulich wrote:

While there don't look to be any problems with this right now, the lock
order implications from holding the lock can be very difficult to follow
(and may be easy to violate unknowingly). The present callbacks don't
(and no such callback should) have any need for the lock to be held.

However, vm_event_disable() frees the structures used by respective
callbacks and isn't otherwise synchronized with invocations of these
callbacks, so maintain a count of in-progress calls, for evtchn_close()
to wait to drop to zero before freeing the port (and dropping the lock).


AFAICT, this callback is not the only place where the synchronization is 
missing in the VM event code.


For instance, vm_event_put_request() can also race against 
vm_event_disable().


So shouldn't we handle this issue properly in VM event?



Signed-off-by: Jan Beulich 
---
Should we make this accounting optional, to be requested through a new
parameter to alloc_unbound_xen_event_channel(), or derived from other
than the default callback being requested?


Aside the VM event, do you see any value for the other caller?


---
v3: Drain callbacks before proceeding with closing. Re-base.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -397,6 +397,7 @@ static long evtchn_bind_interdomain(evtc
  
  rchn->u.interdomain.remote_dom  = ld;

  rchn->u.interdomain.remote_port = lport;
+atomic_set(&rchn->u.interdomain.active_calls, 0);
  rchn->state = ECS_INTERDOMAIN;
  
  /*

@@ -720,6 +721,10 @@ int evtchn_close(struct domain *d1, int
  
  double_evtchn_lock(chn1, chn2);
  
+if ( consumer_is_xen(chn1) )

+while ( atomic_read(&chn1->u.interdomain.active_calls) )
+cpu_relax();
+
  evtchn_free(d1, chn1);
  
  chn2->state = ECS_UNBOUND;

@@ -781,9 +786,15 @@ int evtchn_send(struct domain *ld, unsig
  rport = lchn->u.interdomain.remote_port;
  rchn  = evtchn_from_port(rd, rport);
  if ( consumer_is_xen(rchn) )
+{
+/* Don't keep holding the lock for the call below. */
+atomic_inc(&rchn->u.interdomain.active_calls);
+evtchn_read_unlock(lchn);
  xen_notification_fn(rchn)(rd->vcpu[rchn->notify_vcpu_id], rport);
-else
-evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);


atomic_dec() doesn't contain any memory barrier, so we will want one 
between xen_notification_fn() and atomic_dec() to avoid re-ordering.



+atomic_dec(&rchn->u.interdomain.active_calls);
+return 0;
+}
+evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
  break;
  case ECS_IPI:
  evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -104,6 +104,7 @@ struct evtchn
  } unbound; /* state == ECS_UNBOUND */
  struct {
  evtchn_port_t  remote_port;
+atomic_t   active_calls;
  struct domain *remote_dom;
  } interdomain; /* state == ECS_INTERDOMAIN */
  struct {



Cheers,

--
Julien Grall



Re: [PATCH v3 2/5] evtchn: avoid access tearing for ->virq_to_evtchn[] accesses

2020-12-02 Thread Julien Grall

Hi Jan,

On 23/11/2020 13:28, Jan Beulich wrote:

Use {read,write}_atomic() to exclude any eventualities, in particular
observing that accesses aren't all happening under a consistent lock.

Requested-by: Julien Grall 
Signed-off-by: Jan Beulich 


Reviewed-by: Julien Grall 

Cheers,


---
v3: New.

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -446,7 +446,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t
  
  spin_lock(&d->event_lock);
  
-if ( v->virq_to_evtchn[virq] != 0 )

+if ( read_atomic(&v->virq_to_evtchn[virq]) )
  ERROR_EXIT(-EEXIST);
  
  if ( port != 0 )

@@ -474,7 +474,8 @@ int evtchn_bind_virq(evtchn_bind_virq_t
  
  evtchn_write_unlock(chn);
  
-v->virq_to_evtchn[virq] = bind->port = port;

+bind->port = port;
+write_atomic(&v->virq_to_evtchn[virq], port);
  
   out:

  spin_unlock(&d->event_lock);
@@ -660,9 +661,9 @@ int evtchn_close(struct domain *d1, int
  case ECS_VIRQ:
  for_each_vcpu ( d1, v )
  {
-if ( v->virq_to_evtchn[chn1->u.virq] != port1 )
+if ( read_atomic(&v->virq_to_evtchn[chn1->u.virq]) != port1 )
  continue;
-v->virq_to_evtchn[chn1->u.virq] = 0;
+write_atomic(&v->virq_to_evtchn[chn1->u.virq], 0);
  spin_barrier(&v->virq_lock);
  }
  break;
@@ -801,7 +802,7 @@ bool evtchn_virq_enabled(const struct vc
  if ( virq_is_global(virq) && v->vcpu_id )
  v = domain_vcpu(v->domain, 0);
  
-return v->virq_to_evtchn[virq];

+return read_atomic(&v->virq_to_evtchn[virq]);
  }
  
  void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)

@@ -814,7 +815,7 @@ void send_guest_vcpu_virq(struct vcpu *v
  
  spin_lock_irqsave(&v->virq_lock, flags);
  
-port = v->virq_to_evtchn[virq];

+port = read_atomic(&v->virq_to_evtchn[virq]);
  if ( unlikely(port == 0) )
  goto out;
  
@@ -843,7 +844,7 @@ void send_guest_global_virq(struct domai
  
  spin_lock_irqsave(&v->virq_lock, flags);
  
-port = v->virq_to_evtchn[virq];

+port = read_atomic(&v->virq_to_evtchn[virq]);
  if ( unlikely(port == 0) )
  goto out;
  



--
Julien Grall



[xen-unstable test] 157147: tolerable FAIL

2020-12-02 Thread osstest service owner
flight 157147 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157147/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 20 guest-start/debianhvm.repeat fail 
in 157123 pass in 157147
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 157123

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 157123
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 157123
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 157123
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 157123
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 157123
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 157123
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 157123
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 157123
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 157123
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 157123
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 157123
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 157123
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3ae469af8e680df31eecd0a2ac6a83b58ad7ce53
baseline version:
 xen  3ae469af8e680df31eecd0a2ac6a83b58ad7ce53

Last test of basis   157147  2020-12-02 01:52:26 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm

[xen-unstable-smoke test] 157163: tolerable all pass - PUSHED

2020-12-02 Thread osstest service owner
flight 157163 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157163/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  aec46884784c2494a30221da775d4ac2c43a4d42
baseline version:
 xen  cabf60fc32d4cfa1d74a2bdfcdb294a31da5d68e

Last test of basis   157157  2020-12-02 10:00:26 Z0 days
Testing same since   157163  2020-12-02 19:01:29 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   cabf60fc32..aec4688478  aec46884784c2494a30221da775d4ac2c43a4d42 -> smoke



[PATCH v2 2/2] x86/IRQ: allocate guest array of max size only for shareable IRQs

2020-12-02 Thread Igor Druzhinin
... and increase default "irq_max_guests" to 32.

It's not necessary to have an array of a size more than 1 for non-shareable
IRQs and it might impact scalability in case of high "irq_max_guests"
values being used - every IRQ in the system including MSIs would be
supplied with an array of that size.

Since it's now less impactful to use higher "irq_max_guests" value - bump
the default to 32. That should give more headroom for future systems.

Signed-off-by: Igor Druzhinin 
---

New in v2.
This is suggested by Jan and is optional for me.

---
 docs/misc/xen-command-line.pandoc | 2 +-
 xen/arch/x86/irq.c| 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index f5f230c..dea2a22 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1644,7 +1644,7 @@ This option is ignored in **pv-shim** mode.
 ### irq_max_guests (x86)
 > `= `
 
-> Default: `16`
+> Default: `32`
 
 Maximum number of guests IRQ could be shared between, i.e. a limit on
 the number of guests it is possible to start each having assigned a device
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 5ae9846..70b7a53 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -440,7 +440,7 @@ int __init init_irq_data(void)
 irq_to_desc(irq)->irq = irq;
 
 if ( !irq_max_guests || irq_max_guests > 255)
-irq_max_guests = 16;
+irq_max_guests = 32;
 
 #ifdef CONFIG_PV
 /* Never allocate the hypercall vector or Linux/BSD fast-trap vector. */
@@ -1540,6 +1540,7 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 unsigned intirq;
 struct irq_desc *desc;
 irq_guest_action_t *action, *newaction = NULL;
+unsigned intmax_nr_guests = will_share ? irq_max_guests : 1;
 int rc = 0;
 
 WARN_ON(!spin_is_locked(&v->domain->event_lock));
@@ -1571,7 +1572,7 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 {
 spin_unlock_irq(&desc->lock);
 if ( (newaction = xmalloc_bytes(sizeof(irq_guest_action_t) +
-  irq_max_guests * sizeof(action->guest[0]))) != NULL &&
+  max_nr_guests * sizeof(action->guest[0]))) != NULL &&
  zalloc_cpumask_var(&newaction->cpu_eoi_map) )
 goto retry;
 xfree(newaction);
@@ -1640,7 +1641,7 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 goto retry;
 }
 
-if ( action->nr_guests == irq_max_guests )
+if ( action->nr_guests == max_nr_guests )
 {
 printk(XENLOG_G_INFO "Cannot bind IRQ%d to dom%d. "
"Already at max share %u, increase with irq_max_guests= 
option.\n",
-- 
2.7.4




[PATCH v2 1/2] x86/IRQ: make max number of guests for a shared IRQ configurable

2020-12-02 Thread Igor Druzhinin
... and increase the default to 16.

Current limit of 7 is too restrictive for modern systems where one GSI
could be shared by potentially many PCI INTx sources where each of them
corresponds to a device passed through to its own guest. Some systems do not
apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
interrupt pin for the majority of PCI devices behind a single router,
resulting in overuse of a GSI.

Introduce a new command line option to configure that limit and dynamically
allocate an array of the necessary size. Set the default size now to 16 which
is higher than 7 but could later be increased even more if necessary.

Signed-off-by: Igor Druzhinin 
---

Changes in v2:
- introduced a command line option as suggested
- set the default limit to 16 for now

---
 docs/misc/xen-command-line.pandoc |  9 +
 xen/arch/x86/irq.c| 19 +--
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index b4a0d60..f5f230c 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1641,6 +1641,15 @@ This option is ignored in **pv-shim** mode.
 ### nr_irqs (x86)
 > `= `
 
+### irq_max_guests (x86)
+> `= `
+
+> Default: `16`
+
+Maximum number of guests IRQ could be shared between, i.e. a limit on
+the number of guests it is possible to start each having assigned a device
+sharing a common interrupt line.  Accepts values between 1 and 255.
+
 ### numa (x86)
 > `= on | off | fake= | noacpi`
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8d1f9a9..5ae9846 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -42,6 +42,10 @@ integer_param("nr_irqs", nr_irqs);
 int __read_mostly opt_irq_vector_map = OPT_IRQ_VECTOR_MAP_DEFAULT;
 custom_param("irq_vector_map", parse_irq_vector_map_param);
 
+/* Max number of guests IRQ could be shared with */
+static unsigned int __read_mostly irq_max_guests;
+integer_param("irq_max_guests", irq_max_guests);
+
 vmask_t global_used_vector_map;
 
 struct irq_desc __read_mostly *irq_desc = NULL;
@@ -435,6 +439,9 @@ int __init init_irq_data(void)
 for ( ; irq < nr_irqs; irq++ )
 irq_to_desc(irq)->irq = irq;
 
+if ( !irq_max_guests || irq_max_guests > 255)
+irq_max_guests = 16;
+
 #ifdef CONFIG_PV
 /* Never allocate the hypercall vector or Linux/BSD fast-trap vector. */
 set_bit(LEGACY_SYSCALL_VECTOR, used_vectors);
@@ -1028,7 +1035,6 @@ int __init setup_irq(unsigned int irq, unsigned int 
irqflags,
  * HANDLING OF GUEST-BOUND PHYSICAL IRQS
  */
 
-#define IRQ_MAX_GUESTS 7
 typedef struct {
 u8 nr_guests;
 u8 in_flight;
@@ -1039,7 +1045,7 @@ typedef struct {
 #define ACKTYPE_EOI2 /* EOI on the CPU that was interrupted  */
 cpumask_var_t cpu_eoi_map; /* CPUs that need to EOI this interrupt */
 struct timer eoi_timer;
-struct domain *guest[IRQ_MAX_GUESTS];
+struct domain *guest[];
 } irq_guest_action_t;
 
 /*
@@ -1564,7 +1570,8 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 if ( newaction == NULL )
 {
 spin_unlock_irq(&desc->lock);
-if ( (newaction = xmalloc(irq_guest_action_t)) != NULL &&
+if ( (newaction = xmalloc_bytes(sizeof(irq_guest_action_t) +
+  irq_max_guests * sizeof(action->guest[0]))) != NULL &&
  zalloc_cpumask_var(&newaction->cpu_eoi_map) )
 goto retry;
 xfree(newaction);
@@ -1633,11 +1640,11 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, 
int will_share)
 goto retry;
 }
 
-if ( action->nr_guests == IRQ_MAX_GUESTS )
+if ( action->nr_guests == irq_max_guests )
 {
 printk(XENLOG_G_INFO "Cannot bind IRQ%d to dom%d. "
-   "Already at max share.\n",
-   pirq->pirq, v->domain->domain_id);
+   "Already at max share %u, increase with irq_max_guests= 
option.\n",
+   pirq->pirq, v->domain->domain_id, irq_max_guests);
 rc = -EBUSY;
 goto unlock_out;
 }
-- 
2.7.4




RE: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround

2020-12-02 Thread Wei Chen
Hi Julien,

> -Original Message-
> From: Julien Grall 
> Sent: 2020年12月3日 2:11
> To: Wei Chen ; Stefano Stabellini 
> Cc: Penny Zheng ; xen-devel@lists.xenproject.org;
> Andre Przywara ; Bertrand Marquis
> ; Kaly Xin ; nd
> 
> Subject: Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround
> 
> 
> 
> On 26/11/2020 11:27, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -Original Message-
> >> From: Julien Grall 
> >> Sent: 2020年11月26日 18:55
> >> To: Wei Chen ; Stefano Stabellini
> 
> >> Cc: Penny Zheng ; xen-devel@lists.xenproject.org;
> >> Andre Przywara ; Bertrand Marquis
> >> ; Kaly Xin ; nd
> >> 
> >> Subject: Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround
> >>
> >> Hi Wei,
> >>
> >> Your e-mail font seems to be different to the usual plain text one. Are
> >> you sending the e-mail using HTML by any chance?
> >>
> >
> > It's strange, I always use the plain text.
> 
> Maybe exchange decided to mangle the e-mail :). Anyway, this new message
> looks fine.
> 
> >
> >> On 26/11/2020 02:07, Wei Chen wrote:
> >>> Hi Stefano,
> >>>
>  -Original Message-
>  From: Stefano Stabellini 
>  Sent: 2020��11��26�� 8:00
>  To: Wei Chen 
>  Cc: Julien Grall ; Penny Zheng ;
> >> xen-
>  de...@lists.xenproject.org; sstabell...@kernel.org; Andre Przywara
>  ; Bertrand Marquis
> >> ;
>  Kaly Xin ; nd 
>  Subject: RE: [PATCH] xen/arm: Add Cortex-A73 erratum 858921
> workaround
> 
>  Resuming this old thread.
> 
>  On Fri, 13 Nov 2020, Wei Chen wrote:
> >> Hi,
> >>
> >> On 09/11/2020 08:21, Penny Zheng wrote:
> >>> CNTVCT_EL0 or CNTPCT_EL0 counter read in Cortex-A73 (all versions)
> >>> might return a wrong value when the counter crosses a 32bit boundary.
> >>>
> >>> Until now, there is no case for Xen itself to access CNTVCT_EL0,
> >>> and it also should be the Guest OS's responsibility to deal with
> >>> this part.
> >>>
> >>> But for CNTPCT, there exists several cases in Xen involving reading
> >>> CNTPCT, so a possible workaround is that performing the read twice,
> >>> and to return one or the other depending on whether a transition has
> >>> taken place.
> >>>
> >>> Signed-off-by: Penny Zheng 
> >>
> >> Acked-by: Julien Grall 
> >>
> >> On a related topic, do we need a fix similar to Linux commit
> >> 75a19a0202db "arm64: arch_timer: Ensure counter register reads occur
> >> with seqlock held"?
> >>
> >
> > I take a look at this Linux commit, it seems to prevent the seq-lock to 
> > be
> > speculated.  Using an enforce ordering instead of ISB after the read
> counter
> > operation seems to be for performance reasons.
> >
> > I have found that you had placed an ISB before read counter in 
> > get_cycles
> > to prevent counter value to be speculated. But you haven't placed the
> >> second
> > ISB after reading. Is it because we haven't used the get_cycles in seq 
> > lock
> > critical context (Maybe I didn't find the right place)? So should we 
> > need to
> >> fix it
> > now, or you prefer to fix it now for future usage?
> 
>  Looking at the call sites, it doesn't look like we need any ISB after
>  get_cycles as it is not used in any critical context. There is also a
>  data dependency with the value returned by it.
> >>
> >> I am assuming you looked at all the users of NOW(). Is that right?
> >>
> 
>  So I am thinking we don't need any fix. At most we need an in-code
> comment?
> >>>
> >>> I agree with you to add an in-code comment. It will remind us in future
> when
> >> we
> >>> use the get_cycles in critical context. Adding it now will probably only 
> >>> lead
> to
> >>> meaningless performance degradation.
> >>
> >> I read this as there would be no perfomance impact if we add the
> >> ordering it now. Did you intend to say?
> >
> > Sorry about my English. I intended to say "Adding it now may introduce some
> > performance cost. And this performance cost may be not worth. Because Xen
> > may never use it in a similar scenario "
> 
> Don't worry! I think the performance should not be noticeable if we use
> the same trick as Linux.
> 
> >> In addition, AFAICT, the x86 version of get_cycles() is already able to
> >> provide that ordering. So there are chances that code may rely on it.
> >>
> >> While I don't necessarily agree to add barriers everywhere by default
> >> (this may have big impact on the platform). I think it is better to have
> >> an accurate number of cycles.
> >>
> >
> > As x86 had done it, I think it’s ok to do it for Arm. This will keep a 
> > function
> > behaves the same on different architectures.
> 
> Just to be clear, I am not 100% sure this is what Intel is doing.
> Although this is my understanding of the comment in the code.
> 
> @Stefano, what do you think?
> 
> @Wei, assuming Stefano is happy with the proposal, would you be h

Re: [PATCH v2 8/8] xen/arm: Add support for SMMUv3 driver

2020-12-02 Thread Stefano Stabellini
On Wed, 2 Dec 2020, Julien Grall wrote:
> On 02/12/2020 02:51, Stefano Stabellini wrote:
> > On Thu, 26 Nov 2020, Rahul Singh wrote:
> > > +/* Alias to Xen device tree helpers */
> > > +#define device_node dt_device_node
> > > +#define of_phandle_args dt_phandle_args
> > > +#define of_device_id dt_device_match
> > > +#define of_match_node dt_match_node
> > > +#define of_property_read_u32(np, pname, out) (!dt_property_read_u32(np,
> > > pname, out))
> > > +#define of_property_read_bool dt_property_read_bool
> > > +#define of_parse_phandle_with_args dt_parse_phandle_with_args
> > 
> > Given all the changes to the file by the previous patches we are
> > basically fully (or almost fully) adapting this code to Xen.
> > 
> > So at that point I wonder if we should just as well make these changes
> > (e.g. s/of_phandle_args/dt_phandle_args/g) to the code too.
> 
> I have already accepted the fact that keeping Linux code as-is is nearly
> impossible without much workaround :). The benefits tends to also limited as
> we noticed for the SMMU driver.
> 
> I would like to point out that this may make quite difficult (if not
> impossible) to revert the previous patches which remove support for some
> features (e.g. atomic, MSI, ATS).
> 
> If we are going to adapt the code to Xen (I'd like to keep Linux code style
> though), then I think we should consider to keep code that may be useful in
> the near future (at least MSI, ATS).

(I am fine with keeping the Linux code style.)

We could try to keep the code as similar to Linux as possible. This
didn't work out in the past.

Otherwise, we could fully adapt the driver to Xen. If we fully adapt the
driver to Xen (code style aside) it is better to be consistent and also
do substitutions like s/of_phandle_args/dt_phandle_args/g. Then the
policy becomes clear: the code comes from Linux but it is 100% adapted
to Xen.


Now the question about what to do about the MSI and ATS code is a good
one. We know that we are going to want that code at some point in the
next 2 years. Like you wrote, if we fully adapt the code to Xen and
remove MSI and ATS code, then it is going to be harder to add it back.

So maybe keeping the MSI and ATS code for now, even if it cannot work,
would be better. I think this strategy works well if the MSI and ATS
code can be disabled easily, i.e. with a couple of lines of code in the
init function rather than #ifdef everywhere. It doesn't work well if we
have to add #ifdef everywhere.

It looks like MSI could be disabled adding a couple of lines to
arm_smmu_setup_msis.

Similarly ATS seems to be easy to disable by forcing ats_enabled to
false.

So yes, this looks like a good way forward. Rahul, what do you think?


 
> > > +#define FIELD_GET(_mask, _reg)  \
> > > +(typeof(_mask))(((_reg) & (_mask)) >> (__builtin_ffsll(_mask) - 1))
> > > +
> > > +#define WRITE_ONCE(x, val)  \
> > > +do {\
> > > +*(volatile typeof(x) *)&(x) = (val);\
> > > +} while (0)
> > 
> > maybe we should define this in xen/include/xen/lib.h
> 
> I have attempted such discussion in the past and this resulted to more
> bikeshed than it is worth it. So I would suggest to re-implement WRITE_ONCE()
> as write_atomic() for now.

Good suggestion, less discussions more code :-)



RE: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround

2020-12-02 Thread Stefano Stabellini
On Thu, 3 Dec 2020, Wei Chen wrote:
> Hi Julien,
> 
> > -Original Message-
> > From: Julien Grall 
> > Sent: 2020年12月3日 2:11
> > To: Wei Chen ; Stefano Stabellini 
> > Cc: Penny Zheng ; xen-devel@lists.xenproject.org;
> > Andre Przywara ; Bertrand Marquis
> > ; Kaly Xin ; nd
> > 
> > Subject: Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround
> > 
> > 
> > 
> > On 26/11/2020 11:27, Wei Chen wrote:
> > > Hi Julien,
> > 
> > Hi Wei,
> > 
> > >> -Original Message-
> > >> From: Julien Grall 
> > >> Sent: 2020年11月26日 18:55
> > >> To: Wei Chen ; Stefano Stabellini
> > 
> > >> Cc: Penny Zheng ; xen-devel@lists.xenproject.org;
> > >> Andre Przywara ; Bertrand Marquis
> > >> ; Kaly Xin ; nd
> > >> 
> > >> Subject: Re: [PATCH] xen/arm: Add Cortex-A73 erratum 858921 workaround
> > >>
> > >> Hi Wei,
> > >>
> > >> Your e-mail font seems to be different to the usual plain text one. Are
> > >> you sending the e-mail using HTML by any chance?
> > >>
> > >
> > > It's strange, I always use the plain text.
> > 
> > Maybe exchange decided to mangle the e-mail :). Anyway, this new message
> > looks fine.
> > 
> > >
> > >> On 26/11/2020 02:07, Wei Chen wrote:
> > >>> Hi Stefano,
> > >>>
> >  -Original Message-
> >  From: Stefano Stabellini 
> >  Sent: 2020??11??26?? 8:00
> >  To: Wei Chen 
> >  Cc: Julien Grall ; Penny Zheng ;
> > >> xen-
> >  de...@lists.xenproject.org; sstabell...@kernel.org; Andre Przywara
> >  ; Bertrand Marquis
> > >> ;
> >  Kaly Xin ; nd 
> >  Subject: RE: [PATCH] xen/arm: Add Cortex-A73 erratum 858921
> > workaround
> > 
> >  Resuming this old thread.
> > 
> >  On Fri, 13 Nov 2020, Wei Chen wrote:
> > >> Hi,
> > >>
> > >> On 09/11/2020 08:21, Penny Zheng wrote:
> > >>> CNTVCT_EL0 or CNTPCT_EL0 counter read in Cortex-A73 (all versions)
> > >>> might return a wrong value when the counter crosses a 32bit 
> > >>> boundary.
> > >>>
> > >>> Until now, there is no case for Xen itself to access CNTVCT_EL0,
> > >>> and it also should be the Guest OS's responsibility to deal with
> > >>> this part.
> > >>>
> > >>> But for CNTPCT, there exists several cases in Xen involving reading
> > >>> CNTPCT, so a possible workaround is that performing the read twice,
> > >>> and to return one or the other depending on whether a transition has
> > >>> taken place.
> > >>>
> > >>> Signed-off-by: Penny Zheng 
> > >>
> > >> Acked-by: Julien Grall 
> > >>
> > >> On a related topic, do we need a fix similar to Linux commit
> > >> 75a19a0202db "arm64: arch_timer: Ensure counter register reads occur
> > >> with seqlock held"?
> > >>
> > >
> > > I take a look at this Linux commit, it seems to prevent the seq-lock 
> > > to be
> > > speculated.  Using an enforce ordering instead of ISB after the read
> > counter
> > > operation seems to be for performance reasons.
> > >
> > > I have found that you had placed an ISB before read counter in 
> > > get_cycles
> > > to prevent counter value to be speculated. But you haven't placed the
> > >> second
> > > ISB after reading. Is it because we haven't used the get_cycles in 
> > > seq lock
> > > critical context (Maybe I didn't find the right place)? So should we 
> > > need to
> > >> fix it
> > > now, or you prefer to fix it now for future usage?
> > 
> >  Looking at the call sites, it doesn't look like we need any ISB after
> >  get_cycles as it is not used in any critical context. There is also a
> >  data dependency with the value returned by it.
> > >>
> > >> I am assuming you looked at all the users of NOW(). Is that right?
> > >>
> > 
> >  So I am thinking we don't need any fix. At most we need an in-code
> > comment?
> > >>>
> > >>> I agree with you to add an in-code comment. It will remind us in future
> > when
> > >> we
> > >>> use the get_cycles in critical context. Adding it now will probably 
> > >>> only lead
> > to
> > >>> meaningless performance degradation.
> > >>
> > >> I read this as there would be no perfomance impact if we add the
> > >> ordering it now. Did you intend to say?
> > >
> > > Sorry about my English. I intended to say "Adding it now may introduce 
> > > some
> > > performance cost. And this performance cost may be not worth. Because Xen
> > > may never use it in a similar scenario "
> > 
> > Don't worry! I think the performance should not be noticeable if we use
> > the same trick as Linux.
> > 
> > >> In addition, AFAICT, the x86 version of get_cycles() is already able to
> > >> provide that ordering. So there are chances that code may rely on it.
> > >>
> > >> While I don't necessarily agree to add barriers everywhere by default
> > >> (this may have big impact on the platform). I think it is better to have
> > >> an accurate number of cycles.
> > >>
> > >
> > > As x86 had done it, I think

[linux-5.4 test] 157153: tolerable FAIL - PUSHED

2020-12-02 Thread osstest service owner
flight 157153 linux-5.4 real [real]
flight 157170 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157153/
http://logs.test-lab.xenproject.org/osstest/logs/157170/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-vhd  20 leak-check/checkfail pass in 157170-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156984
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156984
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156984
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 156984
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156984
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156984
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156984
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156984
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156984
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 156984
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156984
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux42af416d71462a72b02ba6ac632c8dcb9ce729a0
baseline version:
 linux9f4b26f3ea18cb2066c9e58a84ff202c71739a41

Last test of basis   156984  2020-11-24 13:11:22 Z8 days
Testing same since   157153  2020-12-02 08:12:37 Z0 days1 attempts


People who touched revisions under test:
  Alan Stern 
  Anand K Mistry 
  Andrew Lunn 
  Ard Biesheuve

Re: [SPECIFICATION RFC] The firmware and bootloader log specification

2020-12-02 Thread Julius Werner
Standardizing in-memory logging sounds like an interesting idea,
especially with regards to components that can run on top of different
firmware stacks (things like GRUB or TF-A). But I would be a bit wary
of creating a "new standard to rule them all" and then expecting all
projects to switch what they have over to that. I think we all know
https://xkcd.com/927/.

Have you looked around and evaluated existing solutions that already
have some proliferation first? I think it would be much easier to
convince people to standardize on something that already has existing
users and drivers available in multiple projects.

In coreboot we're using a very simple character ring buffer that only
has two 4-byte header fields: total size and cursor (i.e. current
position where you would write the next character). The top 4 bits of
the cursor field are reserved for flags, one of which is the
"overflow" flag that tells you whether the ring-buffer already
overflowed or not (so readers know whether to print the whole ring
buffer, or only from the start to the current cursor). We try to
dimension the buffers so they don't overflow on a single boot, but
this approach allows us to log multiple boots on platforms that can
persist memory across reboots, which sometimes comes in handy.

The disadvantages of that approach compared to your proposal are lack
of some features, like the facilities field (although one can still
just print a tag like "<0>" or "<4>" behind each newline) or
timestamps (coreboot instead has separate timestamp logging). But I
think a really big advantage is size: in early firmware environments
before DDR training, space is often very precious and we struggle to
find more than a couple of kilobytes for the log buffer. If I look at
the structure you proposed, that's already 24 bytes of overhead per
individual message. If we were hooking that up to our normal printk()
facility in coreboot (such that each invocation creates a new message
header), that would probably waste about a third of the whole log
buffer on overhead. I think a complicated, syslog-style logging
structure that stores individual message blocks instead of a
continuous character string isn't really suitable for firmware
logging.

FWIW the coreboot console has existing support in Linux
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/google/memconsole-coreboot.c),
SeaBIOS 
(https://github.com/coreboot/seabios/blob/master/src/fw/coreboot.c#L219),
TF-A 
(https://github.com/ARM-software/arm-trusted-firmware/blob/master/drivers/coreboot/cbmem_console/aarch64/cbmem_console.S),
GRUB 
(https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/i386/coreboot/cbmemc.c),
U-Boot 
(https://github.com/u-boot/u-boot/blob/master/drivers/misc/cbmem_console.c)
and probably a couple of others I'm not aware of. And the code to add
support (especially when only appending) is so simple that it just
takes a couple of lines to implement (binary code size to implement
the format is also always a concern for firmware environments).

On Wed, Nov 18, 2020 at 7:04 AM Heinrich Schuchardt  wrote:
>
> On 14.11.20 00:52, Daniel Kiper wrote:
> > Hey,
> >
> > This is next attempt to create firmware and bootloader log specification.
> > Due to high interest among industry it is an extension to the initial
> > bootloader log only specification. It takes into the account most of the
> > comments which I got up until now.
> >
> > The goal is to pass all logs produced by various boot components to the
> > running OS. The OS kernel should expose these logs to the user space
> > and/or process them internally if needed. The content of these logs
> > should be human readable. However, they should also contain the
> > information which allows admins to do e.g. boot time analysis.
> >
> > The log specification should be as much as possible platform agnostic
> > and self contained. The final version of this spec should be merged into
> > existing specifications, e.g. UEFI, ACPI, Multiboot2, or be a standalone
> > spec, e.g. as a part of OASIS Standards. The former seems better but is
> > not perfect too...
> >
> > Here is the description (pseudocode) of the structures which will be
> > used to store the log data.
>
> Hello Daniel,
>
> thanks for your suggestion which makes good sense to me.
>
> Why can't we simply use the message format defined in "The Syslog
> Protocol", https://tools.ietf.org/html/rfc5424?
>
> >
> >   struct bf_log
> >   {
> > uint32_t   version;
> > char   producer[64];
> > uint64_t   flags;
> > uint64_t   next_bf_log_addr;
> > uint32_t   next_msg_off;
> > bf_log_msg msgs[];
>
> As bf_log_msg is does not have defined length msgs[] cannot be an array.
>
> >   }
> >
> >   struct bf_log_msg
> >   {
> > uint32_t size;
> > uint64_t ts_nsec;
> > uint32_t level;
> > uint32_t facility;
> > uint32_t msg_off;
> > char strings[];
> >   }
> >
> > The members of struct bf_log:
> >   

[PATCH v2] tools/hotplug: allow tuning of xenwatchdogd arguments

2020-12-02 Thread Olaf Hering
Currently the arguments for xenwatchdogd are hardcoded with 15s
keep-alive interval and 30s timeout.

It is not possible to tweak these values via
/etc/systemd/system/xen-watchdog.service.d/*.conf because ExecStart
can not be replaced. The only option would be a private copy
/etc/systemd/system/xen-watchdog.service, which may get out of sync
with the Xen provided xen-watchdog.service.

Adjust the service file to recognize XENWATCHDOGD_ARGS= in a
private unit configuration file.

Signed-off-by: Olaf Hering 
---

v2: fix "test -n" in init.d

 tools/hotplug/Linux/init.d/xen-watchdog.in  | 7 ++-
 tools/hotplug/Linux/systemd/xen-watchdog.service.in | 4 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/hotplug/Linux/init.d/xen-watchdog.in 
b/tools/hotplug/Linux/init.d/xen-watchdog.in
index c05f1f6b6a..b36a94bd8e 100644
--- a/tools/hotplug/Linux/init.d/xen-watchdog.in
+++ b/tools/hotplug/Linux/init.d/xen-watchdog.in
@@ -19,6 +19,11 @@
 
 . @XEN_SCRIPT_DIR@/hotplugpath.sh
 
+xencommons_config=@CONFIG_DIR@/@CONFIG_LEAF_DIR@
+
+test -f $xencommons_config/xencommons && . $xencommons_config/xencommons
+
+test -n "$XENWATCHDOGD_ARGS" || XENWATCHDOGD_ARGS='15 30'
 DAEMON=${sbindir}/xenwatchdogd
 base=$(basename $DAEMON)
 
@@ -46,7 +51,7 @@ start() {
local r
echo -n $"Starting domain watchdog daemon: "
 
-   $DAEMON 30 15
+   $DAEMON $XENWATCHDOGD_ARGS
r=$?
[ "$r" -eq 0 ] && success $"$base startup" || failure $"$base startup"
echo
diff --git a/tools/hotplug/Linux/systemd/xen-watchdog.service.in 
b/tools/hotplug/Linux/systemd/xen-watchdog.service.in
index 1eecd2a616..637ab7fd7f 100644
--- a/tools/hotplug/Linux/systemd/xen-watchdog.service.in
+++ b/tools/hotplug/Linux/systemd/xen-watchdog.service.in
@@ -6,7 +6,9 @@ ConditionPathExists=/proc/xen/capabilities
 
 [Service]
 Type=forking
-ExecStart=@sbindir@/xenwatchdogd 30 15
+Environment="XENWATCHDOGD_ARGS=30 15"
+EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons
+ExecStart=@sbindir@/xenwatchdogd $XENWATCHDOGD_ARGS
 KillSignal=USR1
 
 [Install]



Re: [PATCH v2] tools/hotplug: allow tuning of xenwatchdogd arguments

2020-12-02 Thread Jürgen Groß

On 03.12.20 07:34, Olaf Hering wrote:

Currently the arguments for xenwatchdogd are hardcoded with 15s
keep-alive interval and 30s timeout.

It is not possible to tweak these values via
/etc/systemd/system/xen-watchdog.service.d/*.conf because ExecStart
can not be replaced. The only option would be a private copy
/etc/systemd/system/xen-watchdog.service, which may get out of sync
with the Xen provided xen-watchdog.service.

Adjust the service file to recognize XENWATCHDOGD_ARGS= in a
private unit configuration file.

Signed-off-by: Olaf Hering 
---

v2: fix "test -n" in init.d

  tools/hotplug/Linux/init.d/xen-watchdog.in  | 7 ++-
  tools/hotplug/Linux/systemd/xen-watchdog.service.in | 4 +++-
  2 files changed, 9 insertions(+), 2 deletions(-)


Could you please add a section for XENWATCHDOGD_ARGS in
tools/hotplug/Linux/init.d/sysconfig.xencommons.in ?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2] tools/hotplug: allow tuning of xenwatchdogd arguments

2020-12-02 Thread Olaf Hering
Am Thu, 3 Dec 2020 07:47:58 +0100
schrieb Jürgen Groß :

> Could you please add a section for XENWATCHDOGD_ARGS in
> tools/hotplug/Linux/init.d/sysconfig.xencommons.in ?

No. Such details have to go into a to-be-written xencommons(8).

There will be a xenwatchdogd(8) shortly, which will cover this knob.


Olaf


pgpe6mobAUfNx.pgp
Description: Digitale Signatur von OpenPGP


[qemu-mainline test] 157162: regressions - FAIL

2020-12-02 Thread osstest service owner
flight 157162 qemu-mainline real [real]
flight 157173 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157162/
http://logs.test-lab.xenproject.org/osstest/logs/157173/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeatfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 qemuud73c46e4a84e47ffc61b8bf7c378b1383e7316b5
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  104 days
Failing since152659  2020-08-21 14:07:39 Z  103 days  216 attempts
Testing same since   157142  2020-12-01 20:39:57 Z1 days2 attempts


People who touched revisions under test:
Aaron Lindsay 
  Alberto Garcia 
  Aleks