Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: xen-devel-boun...@lists.xen.org
> [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
> Sent: Thursday, June 25, 2015 2:35 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: george.dun...@eu.citrix.com; Zhang, Yang Z; Tian, Kevin; k...@xen.org;
> jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
> 
> On 24/06/15 06:18, Feng Wu wrote:
> > This patch adds cmpxchg16b support for x86-64, so software
> > can perform 128-bit atomic write/read.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > Newly added.
> >
> >  xen/include/asm-x86/x86_64/system.h | 28
> 
> >  xen/include/xen/types.h |  5 +
> >  2 files changed, 33 insertions(+)
> >
> > diff --git a/xen/include/asm-x86/x86_64/system.h
> b/xen/include/asm-x86/x86_64/system.h
> > index 662813a..a910d00 100644
> > --- a/xen/include/asm-x86/x86_64/system.h
> > +++ b/xen/include/asm-x86/x86_64/system.h
> > @@ -6,6 +6,34 @@
> > (unsigned
> long)(n),sizeof(*(ptr
> >
> >  /*
> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> > + * identical, store NEW in MEM.  Return the initial value in MEM.
> > + * Success is indicated by comparing RETURN with OLD.
> > + *
> > + * This function can only be called when cpu_has_cx16 is ture.
> > + */
> > +
> > +static always_inline uint128_t __cmpxchg16b(
> > +volatile void *ptr, uint128_t old, uint128_t new)
> 
> It is not nice for register scheduling taking uint128_t's by value.
> Instead, I would pass them by pointer and let the inlining sort the
> eventual references out.
> 
> > +{
> > +uint128_t prev;
> > +
> > +ASSERT(cpu_has_cx16);
> 
> Given that if this assertion were to fail, cmpxchg16b would fail with
> #UD, I would hand-code a asm_fixup section which in turn panics.  This
> avoids a situation where non-debug builds could die with an unqualified
> #UD exception.

Is there an existing way to panic the hypervisor in assembler code, I
don't find it, it would be appreciated if you can point it out.

> 
> Also, you must enforce 16-byte alignment of the memory reference, as
> described in the manual.

What should I do if the caller passes an non 16-byte alignment data
(struct iremap_entry in this case) ? Do this mean I need to define
it like this?

struct iremap_entry {

..

} __attribute__ ((aligned (16)));

Thanks,
Feng

> 
> ~Andrew
> 
> > +
> > +asm volatile ( "lock; cmpxchg16b %4"
> > +   : "=d" (prev.high), "=a" (prev.low)
> > +   : "c" (new.high), "b" (new.low),
> > +   "m" (*__xg((volatile void *)ptr)),
> > +   "0" (old.high), "1" (old.low)
> > +   : "memory" );
> > +
> > +return prev;
> > +}
> > +
> > +#define cmpxchg16b(ptr,o,n)
> \
> > +__cmpxchg16b((ptr), *(uint128_t *)(o), *(uint128_t *)(n))
> > +
> > +/*
> >   * This function causes value _o to be changed to _n at location _p.
> >   * If this access causes a fault then we return 1, otherwise we return 0.
> >   * If no fault occurs then _o is updated to the value we saw at _p. If this
> > diff --git a/xen/include/xen/types.h b/xen/include/xen/types.h
> > index 8596ded..30f8a44 100644
> > --- a/xen/include/xen/types.h
> > +++ b/xen/include/xen/types.h
> > @@ -47,6 +47,11 @@ typedef __u64   uint64_t;
> >  typedef __u64   u_int64_t;
> >  typedef __s64   int64_t;
> >
> > +typedef struct {
> > +uint64_t low;
> > +uint64_t high;
> > +} uint128_t;
> > +
> >  struct domain;
> >  struct vcpu;
> >
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.

2015-07-08 Thread Jan Beulich
>>> On 07.07.15 at 18:24,  wrote:
> I'm disappointed that you think that. I respect yours, Jan's, etc. role
> as maintainers, and your absolute right to reject anything you think
> is inappropriate. It's clear that Jan, and now apparently you, don't
> respect my abilities or desire to do good work.

I'm not sure what you deduced this from: Not agreeing with a
certain implementation detail decision you took doesn't mean a
lack of respect of your abilities or desire to do good work, at
least not to me. I regret if you felt offended in any way.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 01/15] Vt-d Posted-intterrupt (PI) design

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
sign
> 
> Add the design doc for VT-d PI.
> 
> Signed-off-by: Feng Wu 

Reviewed-by: Kevin Tian 


> +So, gist of above is that, lowest priority interrupts has never been 
> delivered as
> +"lowest priority" in physical hardware.
> +
> +I will emulate vector hashing for posted-interrupt for XEN.

"I will" is not a good usage in design doc. :-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 02/15] Add helper macro for X86_FEATURE_CX16 feature detection

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> Add macro cpu_has_cx16 to detect X86_FEATURE_CX16 feature.
> 
> Signed-off-by: Feng Wu 

Reviewed-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.

2015-07-08 Thread Jan Beulich
>>> On 07.07.15 at 19:38,  wrote:
> In order to make forward progress, do the other maintainers (Jan, Andrew, 
> Tim) agree with the patch direction that George has suggested for this 
> particular patch? 

I for my part do, with the assumption that post-4.6 consolidation of
the increasingly ugly interface is going to be done.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 01/15] Vt-d Posted-intterrupt (PI) design

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 3:21 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 01/15] Vt-d Posted-intterrupt (PI) design
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> sign
> >
> > Add the design doc for VT-d PI.
> >
> > Signed-off-by: Feng Wu 
> 
> Reviewed-by: Kevin Tian 
> 
> 
> > +So, gist of above is that, lowest priority interrupts has never been 
> > delivered
> as
> > +"lowest priority" in physical hardware.
> > +
> > +I will emulate vector hashing for posted-interrupt for XEN.
> 
> "I will" is not a good usage in design doc. :-)

Thanks for the review, I will rephrase it! :)

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 04/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> This patch adds variable 'iommu_intpost' to control whether enable VT-d
> posted-interrupt or not in the generic IOMMU code.
> 
> Signed-off-by: Feng Wu 

Reviewed-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> This patch adds feature detection logic for VT-d posted-interrupt.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> - Remove the "if no intremap then no intpost" logic in
>   intel_vtd_setup(), it is covered in the iommu_setup().
> - Add "if no intremap then no intpost" logic in the end
>   of init_vtd_hw() which is called by vtd_resume().
> 
> So the logic exists in the following three places:
> - parse_iommu_param()
> - iommu_setup()
> - init_vtd_hw()
> 
>  xen/drivers/passthrough/vtd/iommu.c | 18 --
>  xen/drivers/passthrough/vtd/iommu.h |  1 +
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 9053a1f..4221185 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void)
>  disable_intremap(drhd->iommu);
>  }
> 
> +if ( !iommu_intremap )
> +iommu_intpost = 0;
> +
>  /*
>   * Set root entries for each VT-d engine.  After set root entry,
>   * must globally invalidate context cache, and then globally
> @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void)
>  }
> 
>  /* We enable the following features only if they are supported by all 
> VT-d
> - * engines: Snoop Control, DMA passthrough, Queued Invalidation and
> - * Interrupt Remapping.
> + * engines: Snoop Control, DMA passthrough, Queued Invalidation, 
> Interrupt
> + * Remapping, and Posted Interrupt
>   */
>  for_each_drhd_unit ( drhd )
>  {
> @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void)
>  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
>  iommu_intremap = 0;
> 
> +/*
> + * We cannot use posted interrupt if X86_FEATURE_CX16 is
> + * not supported, since we count on this feature to
> + * atomically update 16-byte IRTE in posted format.
> + */
> +if ( !iommu_intremap &&
> + (!cap_intr_post(iommu->cap) || !cpu_has_cx16) )
> +iommu_intpost = 0;
> +

Looks a typo here. &&->||

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4.0 01/55] config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected

2015-07-08 Thread Greg Kroah-Hartman
4.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Konrad Rzeszutek Wilk 

commit a6dfa128ce5c414ab46b1d690f7a1b8decb8526d upstream.

A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').

As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma "mapping" and dma_unmap_addr() to get the "mapping"
value. On most of the platforms this is a no-op, but ... with
"iommu=soft and swiotlb=force" this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address."

As such enable this even when using 32-bit kernels.

Reported-by: Ian Jackson 
Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: David S. Miller 
Acked-by: Prashant Sreedharan 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Michael Chan 
Cc: Thomas Gleixner 
Cc: boris.ostrov...@oracle.com
Cc: casca...@linux.vnet.ibm.com
Cc: david.vra...@citrix.com
Cc: sanje...@broadcom.com
Cc: siva.kal...@broadcom.com
Cc: vyasev...@gmail.com
Cc: xen-de...@lists.xensource.com
Link: http://lkml.kernel.org/r/20150417190448.ga9...@l.oracle.com
Signed-off-by: Ingo Molnar 
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -177,7 +177,7 @@ config SBUS
 
 config NEED_DMA_MAP_STATE
def_bool y
-   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG
+   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB
 
 config NEED_SG_DMA_LENGTH
def_bool y



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 01/13] x86: add socket_cpumask

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 04:43,  wrote:
> On Tue, Jul 07, 2015 at 06:32:55PM -0400, Boris Ostrovsky wrote:
>> >@@ -245,6 +248,8 @@ static void set_cpu_sibling_map(int cpu)
>> >  cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
>> >+cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
>> 
>> This patch crashes Xen on my 32-cpu Intel box here for cpu 16, which is the
>> first CPU on the second socket (i.e. on socket 1).
>> 
>> The reason appears to be that cpu_to_socket(16) is (correctly) 1 here, but
>> ...
>> 
>> >+
>> >  if ( c[cpu].x86_num_siblings > 1 )
>> >  {
>> >  for_each_cpu ( i, &cpu_sibling_setup_map )
>> >@@ -649,7 +654,13 @@ void cpu_exit_clear(unsigned int cpu)
>> >  static void cpu_smpboot_free(unsigned int cpu)
>> >  {
>> >-unsigned int order;
>> >+unsigned int order, socket = cpu_to_socket(cpu);
>> >+
>> >+if ( cpumask_empty(socket_cpumask[socket]) )
>> >+{
>> >+free_cpumask_var(socket_cpumask[socket]);
>> >+socket_cpumask[socket] = NULL;
>> >+}
>> >  free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> >  free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> >@@ -694,6 +705,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
>> >  nodeid_t node = cpu_to_node(cpu);
>> >  struct desc_struct *gdt;
>> >  unsigned long stub_page;
>> >+unsigned int socket = cpu_to_socket(cpu);
>> 
>> ... is zero here, meaning that socket_cpumask[1] is NULL. I suspect that
>> phys_proc_id is probably not set at this point but is by the time we get to
>> set_cpu_sibling_map(). I haven't looked any further yet. I might do this
>> tomorrow unless Chao does it before me.
> 
> Thanks for testing.

Boris' report first of all raises the question: Did you test this at all
on a multi-socket system? Considering you not having tested the
CPU removal case either, I'm starting to wonder how much testing
this series has seen overall...

> I think I have found the reason. For AP, phys_proc_id is set in:
> start_secondary()=>smp_callin()=>smp_store_cpu_info()=>identify_cpu()
> which is behind cpu_smpboot_alloc() called from CPU_PREPARE.
> 
> One way would move 'zalloc_cpumask_var(socket_cpumask + socket)' to
> set_cpu_sibling_map() to fix it if Jan agrees that, otherwise other
> solution needs to be found.

Looks sensible at a first glance, but in order to be able to do
proper error handling the allocation needs to remain in
cpu_smpboot_alloc(). I.e. you'd add a static variable, pre-
allocate a cpumask into it if it's currently NULL, and consume the
allocation in set_cpu_sibling_map() (or maybe even better in
smp_store_cpu_info() right after the identify_cpu() call) if
socket_cpumask[socket] is NULL.

And then you test this on an affected system, and submit
asap, so we can preferably avoid reverting the whole series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Performance problem about address translation

2015-07-08 Thread xinyue


On 2015年07月08日 14:26, xinyue wrote:

Very sorry for sending wrong before.
On 2015年07月08日 14:13, xinyue wrote:


On 2015年07月07日 19:49, Ian Campbell wrote:

On Tue, 2015-07-07 at 11:24 +0800, xinyue wrote:

Please don't use HTML mail and do proper ">" quoting


And after analyzing the performance of hvm domu, I found a process
named "evolution-data-" using almost 99.9% cpu. Does someone known
what's this and why it appears?

evolution-data-server is part of the evolution mail client. It has
nothing to do with Xen I'm afraid so you will have to look elsewhere 
for

why it is taking so much CPU.

Ian.




Sorry for that and thanks very much.

I think the problem maybe caused by the address alignment. The HVM 
DomU crashed after the hypercall and Dom0 crashed later sometimes with 
"Bus error".


I think the function that caused the crash is get_gfn. The related 
code is


unsigned long gfn;
unsigned long mfn;
struct vcpu *vcpu = current;
struct domain *d = vcpu->domain;
uint32_t pfec = PFEC_page_present;
p2m_type_t t;
gfn = paging_gva_to_gfn(current, 0xc029, &pfec);
mfn = get_gfn(d, gfn, &t);

Is that I lost some type translation?


Thanks and best regards!

xinyue



Thanks for all advices, I found the problem appeared because I forget 
adding function put_gfn.


Thanks again and best regards!

xinyue

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> Extend struct pi_desc according to VT-d Posted-Interrupts Spec.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 07/15] vmx: Initialize VT-d Posted-Interrupts Descriptor

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> This patch initializes the VT-d Posted-interrupt Descriptor.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-07-08 Thread Dario Faggioli
[Trimming the Cc-list a bit, to avoid bothering Wei and Jan]

On Tue, 2015-07-07 at 22:56 -0700, Meng Xu wrote:
> Hi Dario,
> 
Hi,

> 2015-07-07 7:03 GMT-07:00 Dario Faggioli :
> >
> > On Mon, 2015-07-06 at 22:51 -0700, Meng Xu wrote:

> > So, it looks to me that, as far as (1) and (2) are concerned, since we
> > are "just" inserting a vCPU in the runq, if we have M pCPUs, and we know
> > whether we inserted it within the first M spots, we already have what we
> > want, or am I missing something? And if __runq_insert() now (with Dagaen
> > patch) tells us this, well, we can simplify the tickling logic, can't
> > we?
> 
> I think you might assume that the first M VCPUs  in the runq are the
> current running VCPUs on the M pCPUs. Am I correct? (From what you
> described in the following example, I think I'm correct. ;-) )
> 
Mmm... Interesting. Yes, I was. I was basing this assumption on this
chunk on Dagaen's patch:

// If we become one of top [# CPUs] in the runq, tickle it
// TODO: make this work when multiple tickles are required
if ( new_position > 0 && new_position <= prv->NUM_CPUS )
runq_tickle(ops, svc);

And forgot (and did not go check) about the __q_remove() in
rt_schedule(). My bad again.

But then, since we don't have the running vCPUs in the runq, how the
code above is supposed to be correct?

> > With an example:
> > We are waking up (or re-inserting, in rt_context_saved()) vCPU j. We
> > have 6 pCPUs. __runq_insert() tells us that it put vCPU j at the 3rd
> > place in the runq. This means vCPU j should be set to run as soon as
> > possible. So, if vCPU j is 3rd in runq, either
> >  (a) there are only 3 runnable vCPUs (i.e., if we are waking up j, there
> >  were 2 of them, and j is the third; if we are in context_saved,
> >  there already where 3, and j just got it's deadline postponed, or
> >  someone else got its one replenished);
> >  (b) there are more than 3 runnable vCPUs, i.e., there is at least a 4th
> >  vCPU --say vCPU k-- in the runq, which was the 3rd before vCPU j
> >  were woken (or re-inserted), but now became the 4th, because
> >  deadline(j) > In case (a), there are for sure idle pCPUs, and we should tickle one of
> > them.
> 
> I tell that you make the above assumption from here.
> 
> However, in the current implementation, runq does not hold the current
> running VCPUs on the pCPUs. We remove the vcpu from runq in
> rt_schedule() function. What you described above make perfect sense
> "if" we decide to make runq hold the current running VCPUs.
> 
Yep. And it indeed seems to me that we may well think about doing so. It
will make it possible to base on the position for making/optimizing
scheduling decisions, and at the same time I don't think I see much
downsides in that, do you?

> Actually, after thinking about the example you described, I think we
> can hold the current running VCPUs *and* the current idle pCPUs in the
> scheduler-wide structure; 
>
What do you mean with 'current idle pCPUs'? I said something similar as
well, and what I meant was a cpumask with bit i set if i-eth pCPU is
idle, do you also mean this?

About the running vCPUs, why just not leave them in the actual runq?

> In other words, we can have another runningq
> (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are
> stored in three queues: runningq, runq, and depletedq, in increasing
> priority order.
> 
Perhaps, but I'm not sure I see the need for another list. Again, why
just not leave them in runq? I appreciate this is a rather big  change
(although, perhaps it looks bigger said than done), but I think it could
be worth pursuing.

For double checking, asserting, and making sure that we are able to
identify the running svc-s, we have the __RTDS_scheduled flag.

> When we make the tickle decision, we only need to scan the idle_pcpu
> and then runningq to figure out which pCPU to tickle. All of other
> design you describe still hold here, except that the position where a
> VCPU is inserted into runq cannot directly give us which pCPU to
> tickle. What do you think?
> 
I think that I'd like to know why you think adding another queue is
necessary, instead of just leaving the vCPUs in the actual runq. Is
there something bad about that which I'm missing?

> > In case (b) there may be idle pCPUs (and, if that's the case, we
> > should tickle one of them, of course) or not. If not, we need to go
> > figure out which pCPU to tickle, which is exactly what runq_tickle()
> > does, but we at least know for sure that we want to tickle the pCPU
> > where vCPU k runs, or others where vCPUs with deadline greater than vCPU
> > k run.
> >
> > Does this make sense?
> 
> Yes, if we decide to hold the currently running VCPUs in
> scheduler-wide structure: it can be runq or runningq.
> 
Yes, but if we use two queues, we defeat at least part of this
optimization/simplification.

> > Still, I think I gave enough material for an actual optimization. What
> > do you 

Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 3:32 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > This patch adds feature detection logic for VT-d posted-interrupt.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - Remove the "if no intremap then no intpost" logic in
> >   intel_vtd_setup(), it is covered in the iommu_setup().
> > - Add "if no intremap then no intpost" logic in the end
> >   of init_vtd_hw() which is called by vtd_resume().
> >
> > So the logic exists in the following three places:
> > - parse_iommu_param()
> > - iommu_setup()
> > - init_vtd_hw()
> >
> >  xen/drivers/passthrough/vtd/iommu.c | 18 --
> >  xen/drivers/passthrough/vtd/iommu.h |  1 +
> >  2 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c
> > b/xen/drivers/passthrough/vtd/iommu.c
> > index 9053a1f..4221185 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void)
> >  disable_intremap(drhd->iommu);
> >  }
> >
> > +if ( !iommu_intremap )
> > +iommu_intpost = 0;
> > +
> >  /*
> >   * Set root entries for each VT-d engine.  After set root entry,
> >   * must globally invalidate context cache, and then globally
> > @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void)
> >  }
> >
> >  /* We enable the following features only if they are supported by all
> VT-d
> > - * engines: Snoop Control, DMA passthrough, Queued Invalidation and
> > - * Interrupt Remapping.
> > + * engines: Snoop Control, DMA passthrough, Queued Invalidation,
> Interrupt
> > + * Remapping, and Posted Interrupt
> >   */
> >  for_each_drhd_unit ( drhd )
> >  {
> > @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void)
> >  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> >  iommu_intremap = 0;
> >
> > +/*
> > + * We cannot use posted interrupt if X86_FEATURE_CX16 is
> > + * not supported, since we count on this feature to
> > + * atomically update 16-byte IRTE in posted format.
> > + */
> > +if ( !iommu_intremap &&
> > + (!cap_intr_post(iommu->cap) || !cpu_has_cx16) )
> > +iommu_intpost = 0;
> > +
> 
> Looks a typo here. &&->||

Yes, this is a typo. Thanks for the review.

Thanks,
Feng
> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v25 00/15] x86/PMU: Xen PMU PV(H) support

2015-07-08 Thread Jan Beulich
>>> On 19.06.15 at 20:44,  wrote:

While making another scan through this series now that some more
reviews from Dietmar are trickling in, I notice:

> Boris Ostrovsky (15):
>   common/symbols: Export hypervisor symbols to privileged guest
>   x86/VPMU: Add public xenpmu.h
>   x86/VPMU: Make vpmu not HVM-specific
>   x86/VPMU: Interface for setting PMU mode and flags

still missing a VMX maintainer's ack

>   x86/VPMU: Initialize VPMUs with __initcall

same here plus no review (albeit I wouldn't make the latter a
requirement)

>   x86/VPMU: Initialize PMU for PV(H) guests

same regarding review state

>   x86/VPMU: Save VPMU state for PV guests during context switch
>   x86/VPMU: When handling MSR accesses, leave fault injection to callers

again same regarding review state

>   x86/VPMU: Add support for PMU register handling on PV guests
>   x86/VPMU: Use pre-computed masks when checking validity of MSRs
>   VPMU/AMD: Check MSR values before writing to hardware

no review yet (and here I'd really like to have one)

>   x86/VPMU: Handle PMU interrupts for PV(H) guests

same here

>   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>   x86/VPMU: Add privileged PMU mode

here a review would again be nice, but I'd again not make it a
requirement

>   x86/VPMU: Move VPMU files up from hvm/ directory

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 09:06,  wrote:

> 
>> -Original Message-
>> From: xen-devel-boun...@lists.xen.org 
>> [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
>> Sent: Thursday, June 25, 2015 2:35 AM
>> To: Wu, Feng; xen-devel@lists.xen.org 
>> Cc: george.dun...@eu.citrix.com; Zhang, Yang Z; Tian, Kevin; k...@xen.org;
>> jbeul...@suse.com 
>> Subject: Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
>> 
>> On 24/06/15 06:18, Feng Wu wrote:
>> > This patch adds cmpxchg16b support for x86-64, so software
>> > can perform 128-bit atomic write/read.
>> >
>> > Signed-off-by: Feng Wu 
>> > ---
>> > v3:
>> > Newly added.
>> >
>> >  xen/include/asm-x86/x86_64/system.h | 28
>> 
>> >  xen/include/xen/types.h |  5 +
>> >  2 files changed, 33 insertions(+)
>> >
>> > diff --git a/xen/include/asm-x86/x86_64/system.h
>> b/xen/include/asm-x86/x86_64/system.h
>> > index 662813a..a910d00 100644
>> > --- a/xen/include/asm-x86/x86_64/system.h
>> > +++ b/xen/include/asm-x86/x86_64/system.h
>> > @@ -6,6 +6,34 @@
>> > (unsigned
>> long)(n),sizeof(*(ptr
>> >
>> >  /*
>> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
>> > + * identical, store NEW in MEM.  Return the initial value in MEM.
>> > + * Success is indicated by comparing RETURN with OLD.
>> > + *
>> > + * This function can only be called when cpu_has_cx16 is ture.
>> > + */
>> > +
>> > +static always_inline uint128_t __cmpxchg16b(
>> > +volatile void *ptr, uint128_t old, uint128_t new)
>> 
>> It is not nice for register scheduling taking uint128_t's by value.
>> Instead, I would pass them by pointer and let the inlining sort the
>> eventual references out.
>> 
>> > +{
>> > +uint128_t prev;
>> > +
>> > +ASSERT(cpu_has_cx16);
>> 
>> Given that if this assertion were to fail, cmpxchg16b would fail with
>> #UD, I would hand-code a asm_fixup section which in turn panics.  This
>> avoids a situation where non-debug builds could die with an unqualified
>> #UD exception.
> 
> Is there an existing way to panic the hypervisor in assembler code, I
> don't find it, it would be appreciated if you can point it out.

I'm not convinced such a #UD would be a significant problem: Looking
at the disassembly will show the cause right away. The out of line
ud2-s in some of VMX'es inline assembly wrappers are far worse.

As to panic()ing from assembly code:

movq$, %rdi
callpanic

>> Also, you must enforce 16-byte alignment of the memory reference, as
>> described in the manual.
> 
> What should I do if the caller passes an non 16-byte alignment data
> (struct iremap_entry in this case) ? Do this mean I need to define
> it like this?
> 
> struct iremap_entry {
> 
> ..
> 
> } __attribute__ ((aligned (16)));

How would that help? The table entries hardware uses are supposed
to be 16-byte aligned anyway, aren't they? I think Andrew's "enforce"
really means ASSERT() or BUG_ON(), again to avoid an unqualified
exception. However - see above.

Plus, all that said, without having seen the actual use sites of
cmpxchg16b yet, I'm not at all convinced we really need this patch.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.4 test] 59139: regressions - FAIL

2015-07-08 Thread osstest service owner
flight 59139 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59139/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 30511

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf-pin  6 xen-boot   fail in 58831 pass in 58798
 test-amd64-amd64-xl   6 xen-boot   fail in 59091 pass in 59139
 test-amd64-amd64-pair10 xen-boot/dst_host   fail pass in 58798
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 58798
 test-amd64-i386-pair 10 xen-boot/dst_host   fail pass in 58831
 test-amd64-i386-pair  9 xen-boot/src_host   fail pass in 58831
 test-amd64-i386-xl-qemuu-win7-amd64  9 windows-install  fail pass in 59091

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-multivcpu  6 xen-boot   fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-libvirt-xsm  6 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail baseline 
untested
 test-amd64-i386-libvirt-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-credit2   6 xen-bootfail baseline untested
 test-amd64-i386-xl-xsm6 xen-bootfail baseline untested
 test-amd64-amd64-xl-xsm   6 xen-bootfail baseline untested
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail baseline untested
 test-amd64-amd64-xl-sedf  6 xen-boot  fail in 58831 like 30406
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 
fail in 59091 baseline untested
 test-amd64-i386-libvirt  11 guest-start  fail   like 30511
 test-amd64-amd64-libvirt 11 guest-start  fail   like 30511
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 30511
 test-amd64-amd64-xl-qemuu-ovmf-amd64  6 xen-bootfail like 53709-bisect
 test-amd64-i386-xl6 xen-bootfail like 53725-bisect
 test-amd64-i386-freebsd10-amd64  6 xen-boot fail like 58780-bisect
 test-amd64-i386-xl-qemuu-winxpsp3  6 xen-boot   fail like 58786-bisect
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-bootfail like 58788-bisect
 test-amd64-i386-rumpuserxen-i386  6 xen-bootfail like 58799-bisect
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  6 xen-bootfail like 58801-bisect
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  6 xen-boot   fail like 58803-bisect
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot  fail like 58804-bisect
 test-amd64-i386-freebsd10-i386  6 xen-boot  fail like 58805-bisect
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot fail like 58806-bisect
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot  fail like 58807-bisect
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot   fail like 58808-bisect
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-bootfail like 58809-bisect
 test-amd64-amd64-rumpuserxen-amd64  6 xen-boot  fail like 58810-bisect
 test-amd64-i386-xl-qemuu-debianhvm-amd64  6 xen-bootfail like 58811-bisect
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot   fail like 58813-bisect
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-bootfail like 58814-bisect
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-bootfail like 58815-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass
 test-amd64-i386-libvirt  12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-libvirt 12 migrate-support-check fail in 58831 never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 59091 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass

version targeted for testing:
 linuxcf1b3dad6c5699b977273276bada8597636ef3e2
baseline version:
 linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70

Last test of basis30511  2014-09-29 16:37:46 Z  281 days
Failing since 32004  2014-12-02 04:10:03 Z  218 days  167 attempts
Testing same since58781  2015-06-20 14:15:50 Z   17 days   21 attempts


500 people touched revisions under test,

Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell
On Wed, 2015-07-08 at 08:54 +0800, Chen, Tiejun wrote:
> >> +"none" is the default value and it means we don't check any reserved 
> >> regions
> >> +and then all rdm policies would be ignored. Guest just works as before and
> >> +the conflict of RDM and guest address space wouldn't be handled, and then
> >> +this may result in the associated device not being able to work or even 
> >> crash
> >> +the VM. So if you're assigning this kind of device, this option is not
> >> +recommended unless you can make sure any conflict doesn't exist.
> >> +
> >
> > One issue didn't come to conclusion during last round of review. Ian was
> > asking what's the difference with type=none vs not specifying rdm option
> > at all.
> >
> > You need to either convince Ian or remove "type=none" in *xl* level.
> > I.e. don't touch the libxl IDL. It still needs a none type.
> 
> I'll update this next revision. And also rephrase this doc to address 
> your comments below.

FTR I think I indicated yesterday that I was satisfied with your
explanation for why type=none exists as an option even at the xl level,
namely that it allows us to change the default in the future.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 for Xen 4.6 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler

2015-07-08 Thread Dario Faggioli
On Tue, 2015-07-07 at 23:06 -0700, Meng Xu wrote:
> 2015-07-07 7:39 GMT-07:00 Dario Faggioli :
> > On Tue, 2015-07-07 at 09:59 +0100, Jan Beulich wrote:
> >> >>> On 29.06.15 at 04:44,  wrote:
> >> > --- a/xen/common/Makefile
> >> > +++ b/xen/common/Makefile
> >> > @@ -31,7 +31,6 @@ obj-y += rbtree.o
> >> >  obj-y += rcupdate.o
> >> >  obj-y += sched_credit.o
> >> >  obj-y += sched_credit2.o
> >> > -obj-y += sched_sedf.o
> >> >  obj-y += sched_arinc653.o
> >> >  obj-y += sched_rt.o
> >> >  obj-y += schedule.o
> >>
> >> Stray change. Or perhaps the file doesn't build anymore, in which case
> >> you should instead have stated that the patch is dependent upon the
> >> series removing SEDF.
> >>
> > This indeed does not belong in here. And of course, things should
> > build... So, Chong, either deal with SEDF as well, if basing your
> > patches on a tree where it is still there, or base on top of my patches,
> > ignore it, but state the dependency, as Jan is asking.
> >
> >> > @@ -1157,8 +1158,75 @@ rt_dom_cntl(
> >
> >> > +case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
> >> > +spin_lock_irqsave(&prv->lock, flags);
> >> > +for( index = 0; index < op->u.v.nr_vcpus; index++ )
> >> > +{
> >> > +if ( copy_from_guest_offset(&local_sched,
> >> > +op->u.v.vcpus, index, 1) )
> >> > +{
> >> > +rc = -EFAULT;
> >> > +break;
> >> > +}
> >> > +if ( local_sched.vcpuid >= d->max_vcpus
> >> > +|| d->vcpu[local_sched.vcpuid] == NULL )
> >> > +{
> >> > +rc = -EINVAL;
> >> > +break;
> >> > +}
> >> > +svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
> >> > +svc->period = MICROSECS(local_sched.s.rtds.period);
> >> > +svc->budget = MICROSECS(local_sched.s.rtds.budget);
> >>
> >> Are all input values valid here?
> >>
> > That's a good point, actually. Right now, SEDF does some range
> > enforcement, by means of these values:
> >
> > #define PERIOD_MAX MILLISECS(1) /* 10s  */
> > #define PERIOD_MIN (MICROSECS(10))  /* 10us */
> > #define SLICE_MIN (MICROSECS(5))/*  5us */
> >
> > Chong, it probably makes sense to (in a separate patch), introduce
> > something like this in RTDS too (with SLICE_MIN-->BUDGET_MIN), and then
> > use them, in this patch, for sanity checking the input.
> >
> > It also makes sense to check and enforce budget<=period, IMO.
> >
> > About the specific values, I'm open to proposals. I think something like
> > the SEDF's one is fine. Meng?
> 
> We are trying to make some range enforcement for RTDS scheduler. Is my
> understanding correct? (It should be, but just in case. :-) )
> 
We are wondering whether that could be necessary/useful, and IMO, it
would.

> As to the range of period, I think the max value can be as large as
> the type of period (ie. s_time_t) can represent. When we want a
> dedicated CPU for a guest, we will set budget=period and  can set the
> period to a very very large value to avoid the unnecessarily
> invocation of the scheduler.
>
Makes sense. We do have STIME_MAX and, given that period is something
that is added to current time during scheduling, STIME_DELTA_MAX.

Maybe, put something together basing on those? 

> As to the min value of period, I think it should be >=100us. The
> scheduler overhead of running a large box could be 1us if the runq is
> long and competetion of the runq lock is heavy. If the scheduler is
> potentially invoked every 10us, the scheduler overhead will be 10% of
> total computation time, which seems a lot to me.
> 
Ok.

> As to the range of budget, the min value can be 5us, the same with
> SEDF; 
>
Well, wouldn't the above reasoning about overhead apply here too?
Budgets of 5us mean the scheduler can be invoked every 5us for budget
enforcement. If 10us was unreasonable, 5 is even more so.

Therefore, 100us here too? Or maybe let's allow for lower values (like
50us or 10us), but print a warning?

> the max value is the value of period of the same VCPU.
> 
Yep.

And, whatever the values, it would be useful to have comments somewhere
(either when the values are defined or enforced), stating what you said
above.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, July 08, 2015 4:13 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
> 
> >>> On 08.07.15 at 09:06,  wrote:
> 
> >
> >> -Original Message-
> >> From: xen-devel-boun...@lists.xen.org
> >> [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
> >> Sent: Thursday, June 25, 2015 2:35 AM
> >> To: Wu, Feng; xen-devel@lists.xen.org
> >> Cc: george.dun...@eu.citrix.com; Zhang, Yang Z; Tian, Kevin; k...@xen.org;
> >> jbeul...@suse.com
> >> Subject: Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
> >>
> >> On 24/06/15 06:18, Feng Wu wrote:
> >> > This patch adds cmpxchg16b support for x86-64, so software
> >> > can perform 128-bit atomic write/read.
> >> >
> >> > Signed-off-by: Feng Wu 
> >> > ---
> >> > v3:
> >> > Newly added.
> >> >
> >> >  xen/include/asm-x86/x86_64/system.h | 28
> >> 
> >> >  xen/include/xen/types.h |  5 +
> >> >  2 files changed, 33 insertions(+)
> >> >
> >> > diff --git a/xen/include/asm-x86/x86_64/system.h
> >> b/xen/include/asm-x86/x86_64/system.h
> >> > index 662813a..a910d00 100644
> >> > --- a/xen/include/asm-x86/x86_64/system.h
> >> > +++ b/xen/include/asm-x86/x86_64/system.h
> >> > @@ -6,6 +6,34 @@
> >> > (unsigned
> >> long)(n),sizeof(*(ptr
> >> >
> >> >  /*
> >> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> >> > + * identical, store NEW in MEM.  Return the initial value in MEM.
> >> > + * Success is indicated by comparing RETURN with OLD.
> >> > + *
> >> > + * This function can only be called when cpu_has_cx16 is ture.
> >> > + */
> >> > +
> >> > +static always_inline uint128_t __cmpxchg16b(
> >> > +volatile void *ptr, uint128_t old, uint128_t new)
> >>
> >> It is not nice for register scheduling taking uint128_t's by value.
> >> Instead, I would pass them by pointer and let the inlining sort the
> >> eventual references out.
> >>
> >> > +{
> >> > +uint128_t prev;
> >> > +
> >> > +ASSERT(cpu_has_cx16);
> >>
> >> Given that if this assertion were to fail, cmpxchg16b would fail with
> >> #UD, I would hand-code a asm_fixup section which in turn panics.  This
> >> avoids a situation where non-debug builds could die with an unqualified
> >> #UD exception.
> >
> > Is there an existing way to panic the hypervisor in assembler code, I
> > don't find it, it would be appreciated if you can point it out.
> 
> I'm not convinced such a #UD would be a significant problem: Looking
> at the disassembly will show the cause right away. The out of line
> ud2-s in some of VMX'es inline assembly wrappers are far worse.
> 

So, do you agree with the fixup section or not?

> As to panic()ing from assembly code:
> 
>   movq$, %rdi
>   callpanic
> 
> >> Also, you must enforce 16-byte alignment of the memory reference, as
> >> described in the manual.
> >
> > What should I do if the caller passes an non 16-byte alignment data
> > (struct iremap_entry in this case) ? Do this mean I need to define
> > it like this?
> >
> > struct iremap_entry {
> >
> > ..
> >
> > } __attribute__ ((aligned (16)));
> 
> How would that help? The table entries hardware uses are supposed
> to be 16-byte aligned anyway, aren't they?

Oh, yes, the base address of the remapping table is 4K aligned.

> I think Andrew's "enforce"
> really means ASSERT() or BUG_ON(), again to avoid an unqualified
> exception. However - see above.
> 
> Plus, all that said, without having seen the actual use sites of
> cmpxchg16b yet, I'm not at all convinced we really need this patch.

After introducing posted format in IRTE, some fields exist in both the
High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
sure it is atomic when updating the pda field?

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately

2015-07-08 Thread Sander Eikelenboom

Tuesday, July 7, 2015, 6:08:25 PM, you wrote:

 On 26.06.15 at 17:48,  wrote:
>> On 2015-06-26 17:22, Jan Beulich wrote:
>>> I have an idea: In
>>> 
>>> static unsigned int startup_msi_irq(struct irq_desc *desc)
>>> {
>>> bool_t guest_masked = (desc->status & IRQ_GUEST) &&
>>>   is_hvm_domain(desc->msi_desc->dev->domain);
>>> 
>>> if ( unlikely(!msi_set_mask_bit(desc, 0, guest_masked)) )
>>> WARN();
>>> return 0;
>>> }
>>> 
>>> I think we need to also exclude the emuirq case (which is what I
>>> understand backs the pvhvm interrupt in the guest - Stefano,
>>> please confirm). For testing purposes, could you try simply passing
>>> zero instead of guest_masked here?
>> 
>> I can confirm, with 0 it works !

> Okay, here's something that hopefully could go in (provided of
> course it too works for you).

Hi Jan,

Just tested and it works fine :-)

--
Sander

> Jan

> --- unstable.orig/xen/arch/x86/irq.c2015-07-07 17:56:52.0 +0200
> +++ unstable/xen/arch/x86/irq.c   2015-07-07 17:04:08.0 +0200
> @@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
>  return ret;
>  }
>  
> +void arch_evtchn_bind_pirq(struct domain *d, int pirq)
> +{
> +int irq = domain_pirq_to_irq(d, pirq);
> +struct irq_desc *desc;
> +unsigned long flags;
> +
> +if ( irq <= 0 )
> +return;
> +
> +if ( is_hvm_domain(d) )
> +map_domain_emuirq_pirq(d, pirq, IRQ_PT);
> +
> +desc = irq_to_desc(irq);
> +spin_lock_irqsave(&desc->lock, flags);
+if ( desc->>msi_desc )
> +guest_mask_msi_irq(desc, 0);
> +spin_unlock_irqrestore(&desc->lock, flags);
> +}
> +
>  bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
>  {
>  return is_hvm_domain(d) && pirq &&
> --- unstable.orig/xen/arch/x86/msi.c2015-07-07 17:56:53.0 +0200
> +++ unstable/xen/arch/x86/msi.c   2015-07-07 16:50:02.0 +0200
> @@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
>  
>  static unsigned int startup_msi_irq(struct irq_desc *desc)
>  {
> -bool_t guest_masked = (desc->status & IRQ_GUEST) &&
> -  is_hvm_domain(desc->msi_desc->dev->domain);
> -
> -msi_set_mask_bit(desc, 0, guest_masked);
> +msi_set_mask_bit(desc, 0, !!(desc->status & IRQ_GUEST));
>  return 0;
>  }
>  
> --- unstable.orig/xen/common/event_channel.c2015-07-07 17:56:51.0 
> +0200
> +++ unstable/xen/common/event_channel.c   2015-07-07 16:53:47.0 
> +0200
> @@ -456,10 +456,7 @@ static long evtchn_bind_pirq(evtchn_bind
>  
>  bind->port = port;
>  
> -#ifdef CONFIG_X86
> -if ( is_hvm_domain(d) && domain_pirq_to_irq(d, pirq) > 0 )
> -map_domain_emuirq_pirq(d, pirq, IRQ_PT);
> -#endif
> +arch_evtchn_bind_pirq(d, pirq);
>  
>   out:
>  spin_unlock(&d->event_lock);
> --- unstable.orig/xen/include/asm-arm/irq.h 2015-07-07 17:56:49.0 
> +0200
> +++ unstable/xen/include/asm-arm/irq.h  2015-07-07 17:02:00.0 +0200
> @@ -48,6 +48,8 @@ int release_guest_irq(struct domain *d, 
>  
>  void arch_move_irqs(struct vcpu *v);
>  
> +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
> +
>  /* Set IRQ type for an SPI */
>  int irq_set_spi_type(unsigned int spi, unsigned int type);
>  
> --- unstable.orig/xen/include/xen/irq.h   2015-07-07 17:56:49.0 
> +0200
> +++ unstable/xen/include/xen/irq.h  2015-07-07 17:02:49.0 +0200
> @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
>  unsigned int arch_hwdom_irqs(domid_t);
>  #endif
>  
> +#ifndef arch_evtchn_bind_pirq
> +void arch_evtchn_bind_pirq(struct domain *, int pirq);
> +#endif
> +
>  #endif /* __XEN_IRQ_H__ */




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 10:33,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Wednesday, July 08, 2015 4:13 PM
>> >>> On 08.07.15 at 09:06,  wrote:
>> >> From: xen-devel-boun...@lists.xen.org 
>> >> [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
>> >> Sent: Thursday, June 25, 2015 2:35 AM
>> >> On 24/06/15 06:18, Feng Wu wrote:
>> >> > +{
>> >> > +uint128_t prev;
>> >> > +
>> >> > +ASSERT(cpu_has_cx16);
>> >>
>> >> Given that if this assertion were to fail, cmpxchg16b would fail with
>> >> #UD, I would hand-code a asm_fixup section which in turn panics.  This
>> >> avoids a situation where non-debug builds could die with an unqualified
>> >> #UD exception.
>> >
>> > Is there an existing way to panic the hypervisor in assembler code, I
>> > don't find it, it would be appreciated if you can point it out.
>> 
>> I'm not convinced such a #UD would be a significant problem: Looking
>> at the disassembly will show the cause right away. The out of line
>> ud2-s in some of VMX'es inline assembly wrappers are far worse.
> 
> So, do you agree with the fixup section or not?

I'd rather not go that route, unless Andrew or your manage to
convince me otherwise.

>> I think Andrew's "enforce"
>> really means ASSERT() or BUG_ON(), again to avoid an unqualified
>> exception. However - see above.
>> 
>> Plus, all that said, without having seen the actual use sites of
>> cmpxchg16b yet, I'm not at all convinced we really need this patch.
> 
> After introducing posted format in IRTE, some fields exist in both the
> High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
> sure it is atomic when updating the pda field?

Is there a need for updating these _after_ initially setting up an
entry?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d -> ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Sander Eikelenboom

Monday, July 6, 2015, 11:33:09 AM, you wrote:

 On 26.06.15 at 17:57,  wrote:
>> On 2015-06-26 17:51, Jan Beulich wrote:
>> On 26.06.15 at 17:41,  wrote:
 from 3.16 to 3.19 we gained a lot of these, if i remember correctly
 related to
 perf being enabled in the kernel:
 
 +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
 0xe023e008 to 0x00230010.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
 0x82d0b000 to 0x81bc2670.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
 0x82d0b020 to 0x81bc4630.
>>> 
>>> These are the SYSCALL (STAR) MSRs, which the kernel has no business
>>> touching when running on Xen.
>>> 
 from 3.19 to 4.0 we gained:
 +   d0 attempted to change d0v0's CR4 flags 0660 -> 0760
 +   d0 attempted to change d0v1's CR4 flags 0660 -> 0760
 +   d0 attempted to change d0v2's CR4 flags 0660 -> 0760
 +   d0 attempted to change d0v3's CR4 flags 0660 -> 0760
 +   d0 attempted to change d0v4's CR4 flags 0660 -> 0760
 +   d0 attempted to change d0v5's CR4 flags 0660 -> 0760
>>> 
>>> This is X86_CR4_PCE - not sure how to properly handle that.
>>> Andrew, you're fiddling with the CR4 handling right now anyway -
>>> any thoughts?
>>> 
 and from 4.0 to 4.1 we gained the ones you were interested in:
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> 
>>> For these to be meaningful you need to translate them to symbolic
>>> addresses. (And yes, we should see to make the code print them
>>> in a more useful manner.)
>> 
>> How ?

> addr2line against xen-syms (or xen.efi if you use that one). And of
> course the result may need manual adjustment to account for
> eventual patches you have in your tree.

> Jan

Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
instead of xen, so running it against vmlinux of course lead no where.

Here we go:

(XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
82d080239d85
(XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
82d080239d85

which leads to:
# addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
/usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758

# addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
??:?

Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:

case MSR_EFER:
 rdmsr_normal:
/* Everyone can read the MSR space. */
/* gdprintk(XENLOG_WARNING,"Domain attempted RDMSR %p.\n",
_p(regs->ecx));*/
HERE -->if ( rdmsr_safe(regs->ecx, val) )
goto fail;
 rdmsr_writeback:
regs->eax = (uint32_t)val;
regs->edx = (uint32_t)(val >> 32);
break;
}
break;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Andrew Cooper
On 08/07/2015 09:12, Jan Beulich wrote:
>
>>>
 +{
 +uint128_t prev;
 +
 +ASSERT(cpu_has_cx16);
>>> Given that if this assertion were to fail, cmpxchg16b would fail with
>>> #UD, I would hand-code a asm_fixup section which in turn panics.  This
>>> avoids a situation where non-debug builds could die with an unqualified
>>> #UD exception.
>> Is there an existing way to panic the hypervisor in assembler code, I
>> don't find it, it would be appreciated if you can point it out.

When I asked for this, I was thinking of having an assertion frame with
the cmpxchg16b instruction in the place of the regular ud2a.  This way,
if it were to failed with #UD, there is a more useful error message.

However, there is no easy way of doing this at the moment, and it is an
obscure set of circumstances, so probably not worth the hassle.

> I'm not convinced such a #UD would be a significant problem: Looking
> at the disassembly will show the cause right away. The out of line
> ud2-s in some of VMX'es inline assembly wrappers are far worse.

Unqualified #UDs are harder to debug than qualified ones, and I have an
annoying habit of hitting them.  In some copious free time, I want to
continue the work started with c/s 0a3e27e and 881d6bf.  git grep
suggests there isn't actually too much to fix up in this regard.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, July 08, 2015 4:44 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
> 
> >>> On 08.07.15 at 10:33,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Wednesday, July 08, 2015 4:13 PM
> >> >>> On 08.07.15 at 09:06,  wrote:
> >> >> From: xen-devel-boun...@lists.xen.org
> >> >> [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
> >> >> Sent: Thursday, June 25, 2015 2:35 AM
> >> >> On 24/06/15 06:18, Feng Wu wrote:
> >> >> > +{
> >> >> > +uint128_t prev;
> >> >> > +
> >> >> > +ASSERT(cpu_has_cx16);
> >> >>
> >> >> Given that if this assertion were to fail, cmpxchg16b would fail with
> >> >> #UD, I would hand-code a asm_fixup section which in turn panics.  This
> >> >> avoids a situation where non-debug builds could die with an unqualified
> >> >> #UD exception.
> >> >
> >> > Is there an existing way to panic the hypervisor in assembler code, I
> >> > don't find it, it would be appreciated if you can point it out.
> >>
> >> I'm not convinced such a #UD would be a significant problem: Looking
> >> at the disassembly will show the cause right away. The out of line
> >> ud2-s in some of VMX'es inline assembly wrappers are far worse.
> >
> > So, do you agree with the fixup section or not?
> 
> I'd rather not go that route, unless Andrew or your manage to
> convince me otherwise.
> 
> >> I think Andrew's "enforce"
> >> really means ASSERT() or BUG_ON(), again to avoid an unqualified
> >> exception. However - see above.
> >>
> >> Plus, all that said, without having seen the actual use sites of
> >> cmpxchg16b yet, I'm not at all convinced we really need this patch.
> >
> > After introducing posted format in IRTE, some fields exist in both the
> > High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
> > sure it is atomic when updating the pda field?
> 
> Is there a need for updating these _after_ initially setting up an
> entry?

Each time the guest sets the affinity, we need to change this
filed to refer to the new destination.

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich
Rather than assuming only PV guests need special treatment (and
dealing with that directly when an IRQ gets set up), keep all guest MSI
IRQs masked until either the (HVM) guest unmasks them via vMSI or the
(PV, PVHVM, or PVH) guest sets up an event channel for it.

To not further clutter the common evtchn_bind_pirq() with x86-specific
code, introduce an arch_evtchn_bind_pirq() hook instead.

Reported-by: Sander Eikelenboom 
Signed-off-by: Jan Beulich 
Tested-by: Sander Eikelenboom 

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
 return ret;
 }
 
+void arch_evtchn_bind_pirq(struct domain *d, int pirq)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+struct irq_desc *desc;
+unsigned long flags;
+
+if ( irq <= 0 )
+return;
+
+if ( is_hvm_domain(d) )
+map_domain_emuirq_pirq(d, pirq, IRQ_PT);
+
+desc = irq_to_desc(irq);
+spin_lock_irqsave(&desc->lock, flags);
+if ( desc->msi_desc )
+guest_mask_msi_irq(desc, 0);
+spin_unlock_irqrestore(&desc->lock, flags);
+}
+
 bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
 {
 return is_hvm_domain(d) && pirq &&
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
 
 static unsigned int startup_msi_irq(struct irq_desc *desc)
 {
-bool_t guest_masked = (desc->status & IRQ_GUEST) &&
-  is_hvm_domain(desc->msi_desc->dev->domain);
-
-msi_set_mask_bit(desc, 0, guest_masked);
+msi_set_mask_bit(desc, 0, !!(desc->status & IRQ_GUEST));
 return 0;
 }
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind
 
 bind->port = port;
 
-#ifdef CONFIG_X86
-if ( is_hvm_domain(d) && domain_pirq_to_irq(d, pirq) > 0 )
-map_domain_emuirq_pirq(d, pirq, IRQ_PT);
-#endif
+arch_evtchn_bind_pirq(d, pirq);
 
  out:
 spin_unlock(&d->event_lock);
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
 
 void arch_move_irqs(struct vcpu *v);
 
+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+
 /* Set IRQ type for an SPI */
 int irq_set_spi_type(unsigned int spi, unsigned int type);
 
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
 unsigned int arch_hwdom_irqs(domid_t);
 #endif
 
+#ifndef arch_evtchn_bind_pirq
+void arch_evtchn_bind_pirq(struct domain *, int pirq);
+#endif
+
 #endif /* __XEN_IRQ_H__ */



x86/MSI: fix guest unmasking when handling IRQ via event channel

Rather than assuming only PV guests need special treatment (and
dealing with that directly when an IRQ gets set up), keep all guest MSI
IRQs masked until either the (HVM) guest unmasks them via vMSI or the
(PV, PVHVM, or PVH) guest sets up an event channel for it.

To not further clutter the common evtchn_bind_pirq() with x86-specific
code, introduce an arch_evtchn_bind_pirq() hook instead.

Reported-by: Sander Eikelenboom 
Signed-off-by: Jan Beulich 
Tested-by: Sander Eikelenboom 

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
 return ret;
 }
 
+void arch_evtchn_bind_pirq(struct domain *d, int pirq)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+struct irq_desc *desc;
+unsigned long flags;
+
+if ( irq <= 0 )
+return;
+
+if ( is_hvm_domain(d) )
+map_domain_emuirq_pirq(d, pirq, IRQ_PT);
+
+desc = irq_to_desc(irq);
+spin_lock_irqsave(&desc->lock, flags);
+if ( desc->msi_desc )
+guest_mask_msi_irq(desc, 0);
+spin_unlock_irqrestore(&desc->lock, flags);
+}
+
 bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
 {
 return is_hvm_domain(d) && pirq &&
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
 
 static unsigned int startup_msi_irq(struct irq_desc *desc)
 {
-bool_t guest_masked = (desc->status & IRQ_GUEST) &&
-  is_hvm_domain(desc->msi_desc->dev->domain);
-
-msi_set_mask_bit(desc, 0, guest_masked);
+msi_set_mask_bit(desc, 0, !!(desc->status & IRQ_GUEST));
 return 0;
 }
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind
 
 bind->port = port;
 
-#ifdef CONFIG_X86
-if ( is_hvm_domain(d) && domain_pirq_to_irq(d, pirq) > 0 )
-map_domain_emuirq_pirq(d, pirq, IRQ_PT);
-#endif
+arch_evtchn_bind_pirq(d, pirq);
 
  out:
 spin_unlock(&d->event_lock);
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
 
 void arch_move_irqs(struct vcpu *v);
 
+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+
 /* Set IRQ type for an SPI

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d -> ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Andrew Cooper
On 08/07/2015 09:45, Sander Eikelenboom wrote:
> Monday, July 6, 2015, 11:33:09 AM, you wrote:
>
> On 26.06.15 at 17:57,  wrote:
>>> On 2015-06-26 17:51, Jan Beulich wrote:
>>> On 26.06.15 at 17:41,  wrote:
> from 3.16 to 3.19 we gained a lot of these, if i remember correctly
> related to
> perf being enabled in the kernel:
>
> +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
> 0xe023e008 to 0x00230010.
> +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
> 0x82d0b000 to 0x81bc2670.
> +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
> 0x82d0b020 to 0x81bc4630.
 These are the SYSCALL (STAR) MSRs, which the kernel has no business
 touching when running on Xen.

> from 3.19 to 4.0 we gained:
> +   d0 attempted to change d0v0's CR4 flags 0660 -> 0760
> +   d0 attempted to change d0v1's CR4 flags 0660 -> 0760
> +   d0 attempted to change d0v2's CR4 flags 0660 -> 0760
> +   d0 attempted to change d0v3's CR4 flags 0660 -> 0760
> +   d0 attempted to change d0v4's CR4 flags 0660 -> 0760
> +   d0 attempted to change d0v5's CR4 flags 0660 -> 0760
 This is X86_CR4_PCE - not sure how to properly handle that.
 Andrew, you're fiddling with the CR4 handling right now anyway -
 any thoughts?

> and from 4.0 to 4.1 we gained the ones you were interested in:
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
 For these to be meaningful you need to translate them to symbolic
 addresses. (And yes, we should see to make the code print them
 in a more useful manner.)
>>> How ?
>> addr2line against xen-syms (or xen.efi if you use that one). And of
>> course the result may need manual adjustment to account for
>> eventual patches you have in your tree.
>> Jan
> Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
> instead of xen, so running it against vmlinux of course lead no where.
>
> Here we go:
>
> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
> 82d080239d85
> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
> 82d080239d85
>
> which leads to:
> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
> /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758
>
> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
> ??:?

The second one is not.  It is the fixup label, which will be hidden away
out-of-line, and lacking debug symbols.

>
> Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:
>
> case MSR_EFER:
>  rdmsr_normal:
> /* Everyone can read the MSR space. */
> /* gdprintk(XENLOG_WARNING,"Domain attempted RDMSR %p.\n",
> _p(regs->ecx));*/
> HERE -->if ( rdmsr_safe(regs->ecx, val) )
> goto fail;

Moving the printk into the fail case will identify which is the
problematic MSR.  We need the value of regs->_ecx here (the low 32bits,
not the full 64 as the commented printk currently has).

I have a small todo list of misc debugging improvements.  I will add
this to the list.

~Andrew

>  rdmsr_writeback:
> regs->eax = (uint32_t)val;
> regs->edx = (uint32_t)(val >> 32);
> break;
> }
> break;
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-4.1 test] 59143: regressions - FAIL

2015-07-08 Thread osstest service owner
flight 59143 linux-4.1 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59143/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 
fail REGR. vs. 59031

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start   fail REGR. vs. 59031

Tests which did not succeed, but are not blocking:
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 11 guest-start  fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 linux6a010c0abd49388a49af3d5a5bfc00e0d5767607
baseline version:
 linuxb953c0d234bc72e8489d3bf51a276c5c4ec85345

Last test of basis59031  2015-07-02 23:39:59 Z5 days
Testing same since59054  2015-07-05 10:20:43 Z2 days3 attempts


People who touched revisions under test:
  Alexander Shishkin 
  Alexey Sokolov 
  Andi Kleen 
  Arnaldo Carvalho de Melo 
  Borislav Petkov 
  Borislav Petkov 
  Dmitry Tunin 
  Greg Kroah-Hartman 
  Imre Palik 
  Ingo Molnar 
  Jiri Olsa 
  Kalle Valo 
  Lukas Wunner 
  Marcel Holtmann 
  Oleg Nesterov 
  Palik, Imre 
  Peter Zijlstra (Intel) 
  Rafał Miłecki 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-xl-xsm  pass
 test-armhf-armhf-xl-xsm  pass
 test-amd64-i386-xl-xsm   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debi

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d -> ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 10:45,  wrote:
> Here we go:
> 
> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
> 82d080239d85
> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 -> 
> 82d080239d85
> 
> which leads to:
> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
> /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758
> 
> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
> ??:?
> 
> Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:
> 
> case MSR_EFER:
>  rdmsr_normal:
> /* Everyone can read the MSR space. */
> /* gdprintk(XENLOG_WARNING,"Domain attempted RDMSR %p.\n",
> _p(regs->ecx));*/
> HERE -->if ( rdmsr_safe(regs->ecx, val) )

Right, so as Andrew suspected - we won't know whether that's
legitimate/reasonable without knowing the MSR being accessed.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] net/bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6

2015-07-08 Thread Pablo Neira Ayuso
On Tue, Jul 07, 2015 at 11:34:34AM -0700, Stephen Hemminger wrote:
> On Tue, 7 Jul 2015 15:55:21 +0100
> Julien Grall  wrote:
> 
> > The commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd "netfilter: bridge:
> > forward IPv6 fragmented packets" introduced a new function
> > br_validate_ipv6 which take a reference on the inet6 device. Although,
> > the reference is not released at the end.
> > 
> > This will result to the impossibility to destroy any netdevice using
> > ipv6 and bridge.
> > 
> > It's possible to directly retrieve the inet6 device without taking a
> > reference as all netfilter hooks are protected by rcu_read_lock via
> > nf_hook_slow.
> > 
> > Spotted while trying to destroy a Xen guest on the upstream Linux:
> > "unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1"
> > 
> > Signed-off-by: Julien Grall 
> > Cc: Bernhard Thaler 
> > Cc: Pablo Neira Ayuso 
> > Cc: f...@strlen.de
> > Cc: ian.campb...@citrix.com
> > Cc: wei.l...@citrix.com
> > Cc: Bob Liu 
> > 
> > ---
> > Note that it's impossible to create new guest after this message.
> > I'm not sure if it's normal.
> > 
> > Changes in v2:
> > - Don't take a reference to inet6.
> > - This was "net/bridge: Add missing in6_dev_put in
> > br_validate_ipv6" [0]
> > 
> > [0] https://lkml.org/lkml/2015/7/3/443
> > ---
> >  net/bridge/br_netfilter_ipv6.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I like this simple solution
> 
> Acked-by: Stephen Hemminger 

Applied, thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Andrew Cooper
On 08/07/2015 09:56, Jan Beulich wrote:
> Rather than assuming only PV guests need special treatment (and
> dealing with that directly when an IRQ gets set up), keep all guest MSI
> IRQs masked until either the (HVM) guest unmasks them via vMSI or the
> (PV, PVHVM, or PVH) guest sets up an event channel for it.
>
> To not further clutter the common evtchn_bind_pirq() with x86-specific
> code, introduce an arch_evtchn_bind_pirq() hook instead.
>
> Reported-by: Sander Eikelenboom 
> Signed-off-by: Jan Beulich 
> Tested-by: Sander Eikelenboom 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Chen, Tiejun

I'll update this next revision. And also rephrase this doc to address
your comments below.


FTR I think I indicated yesterday that I was satisfied with your
explanation for why type=none exists as an option even at the xl level,
namely that it allows us to change the default in the future.



Campbell,

Jackson had some different comments at this point,

#1. Rename "type" to "strategy " and "none" to "ignore"

--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
 (2, "PV"),
 ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")

+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+(0, "ignore"),
+(1, "host"),
+])
+
...
 libxl_channel_connection = Enumeration("channel_connection", [
 (0, "UNKNOWN"),
 (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
 ("vcpus", libxl_bitmap), # vcpus in this node
 ])

+libxl_rdm_reserve = Struct("rdm_reserve", [
+("strategy",libxl_rdm_reserve_strategy),
+("reserve", libxl_rdm_reserve_flag),
+])
+

#2. Don't expose "ignore" to user and just keep "host" as the default

He told me he would discuss this with you, but sounds he didn't do this, 
or I'm missing something here?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Julien Grall

Hi,

On 08/07/2015 09:56, Jan Beulich wrote:

--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,

  void arch_move_irqs(struct vcpu *v);

+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+


This addition is here in order to ensure that d and pirq are evaluated, 
right?


If so, I didn't find it obvious to understand. Why didn't you use a 
static inline? Or maybe add a comment explicitly say this is not 
implemented.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot send
> posted-interrupt when 'SN' is set.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> use cmpxchg to test SN/ON and set ON
> 
>  xen/arch/x86/hvm/vmx/vmx.c | 32 
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 0837627..b94ef6a 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu 
> *v)
> 
>  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>  {
> +struct pi_desc old, new, prev;
> +

move to 'else if'.

>  if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
>  return;
> 
> @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8
> vector)
>   */
>  pi_set_on(&v->arch.hvm_vmx.pi_desc);
>  }
> -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> +else
>  {
> +prev.control = 0;
> +
> +do {
> +old.control = v->arch.hvm_vmx.pi_desc.control &
> +  ~(1 << POSTED_INTR_ON | 1 << POSTED_INTR_SN);
> +new.control = v->arch.hvm_vmx.pi_desc.control |
> +  1 << POSTED_INTR_ON;
> +
> +/*
> + * Currently, we don't support urgent interrupt, all
> + * interrupts are recognized as non-urgent interrupt,
> + * so we cannot send posted-interrupt when 'SN' is set.
> + * Besides that, if 'ON' is already set, we cannot set
> + * posted-interrupts as well.
> + */
> +if ( prev.sn || prev.on )
> +{
> +vcpu_kick(v);
> +return;
> +}

would it make more sense to move above check after cmpxchg?

> +
> +prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
> +   old.control, new.control);
> +} while ( prev.control != old.control );
> +
>  __vmx_deliver_posted_interrupt(v);
> -return;
>  }
> -
> -vcpu_kick(v);
>  }
> 
>  static void vmx_sync_pir_to_irr(struct vcpu *v)
> --
> 2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 09/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell
On Wed, 2015-07-08 at 17:06 +0800, Chen, Tiejun wrote:

> #2. Don't expose "ignore" to user and just keep "host" as the default
> 
> He told me he would discuss this with you, but sounds he didn't do this, 
> or I'm missing something here?

My question was regarding how xl rdm="type=none" differed from not
saying anything (i.e. getting the default). You explained that this was
useful to allow the default to be changed, which I agreed with.

The question regarding the actually naming of the options at either the
xl level or the libxl (which seems to be what Ian J's comments were on)
are orthogonal to the question of whether there should be a way to
explicitly ask for the default (as opposed to implicitly asking for it
by omission of the option).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 1/7] libxl: get rid of the SEDF scheduler

2015-07-08 Thread Ian Campbell
On Tue, 2015-07-07 at 18:43 +0200, Dario Faggioli wrote:
> only the interface is left in place, for backward
> compile-time compatibility, but every attempt to
> use it would throw an error.
> 
> Signed-off-by: Dario Faggioli 
> ---
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Stefano Stabellini 

Acked-by: Ian Campbell 

> Cc: Wei Liu 
> 
> Changes from v3:
>  - drop George's Rev-by: which should not be there since v2;
>  - better grouping of fields in libxl_domain_sched_params, as
>suggested during review;
>  - improved comment for ERROR_FEATURE_REMOVED, as suggested
>during review.
> 
> Changes from v2:
>  - introduce and use ERROR_FEATURE_REMOVED, as requested
>during review;
>  - mark the SEDF only parameter as deprecated in libxl_types.idl,
>as requested during review.
> ---
>  tools/libxl/libxl.c |   73 
> ++-
>  tools/libxl/libxl_create.c  |   61 
>  tools/libxl/libxl_types.idl |8 -
>  3 files changed, 11 insertions(+), 131 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 3a83903..38aff8d 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5728,73 +5728,6 @@ static int sched_credit2_domain_set(libxl__gc *gc, 
> uint32_t domid,
>  return 0;
>  }
>  
> -static int sched_sedf_domain_get(libxl__gc *gc, uint32_t domid,
> - libxl_domain_sched_params *scinfo)
> -{
> -uint64_t period;
> -uint64_t slice;
> -uint64_t latency;
> -uint16_t extratime;
> -uint16_t weight;
> -int rc;
> -
> -rc = xc_sedf_domain_get(CTX->xch, domid, &period, &slice, &latency,
> -&extratime, &weight);
> -if (rc != 0) {
> -LOGE(ERROR, "getting domain sched sedf");
> -return ERROR_FAIL;
> -}
> -
> -libxl_domain_sched_params_init(scinfo);
> -scinfo->sched = LIBXL_SCHEDULER_SEDF;
> -scinfo->period = period / 100;
> -scinfo->slice = slice / 100;
> -scinfo->latency = latency / 100;
> -scinfo->extratime = extratime;
> -scinfo->weight = weight;
> -
> -return 0;
> -}
> -
> -static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
> - const libxl_domain_sched_params *scinfo)
> -{
> -uint64_t period;
> -uint64_t slice;
> -uint64_t latency;
> -uint16_t extratime;
> -uint16_t weight;
> -
> -int ret;
> -
> -ret = xc_sedf_domain_get(CTX->xch, domid, &period, &slice, &latency,
> -&extratime, &weight);
> -if (ret != 0) {
> -LOGE(ERROR, "getting domain sched sedf");
> -return ERROR_FAIL;
> -}
> -
> -if (scinfo->period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT)
> -period = (uint64_t)scinfo->period * 100;
> -if (scinfo->slice != LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT)
> -slice = (uint64_t)scinfo->slice * 100;
> -if (scinfo->latency != LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT)
> -latency = (uint64_t)scinfo->latency * 100;
> -if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
> -extratime = scinfo->extratime;
> -if (scinfo->weight != LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT)
> -weight = scinfo->weight;
> -
> -ret = xc_sedf_domain_set(CTX->xch, domid, period, slice, latency,
> -extratime, weight);
> -if ( ret < 0 ) {
> -LOGE(ERROR, "setting domain sched sedf");
> -return ERROR_FAIL;
> -}
> -
> -return 0;
> -}
> -
>  static int sched_rtds_domain_get(libxl__gc *gc, uint32_t domid,
> libxl_domain_sched_params *scinfo)
>  {
> @@ -5873,7 +5806,8 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, 
> uint32_t domid,
>  
>  switch (sched) {
>  case LIBXL_SCHEDULER_SEDF:
> -ret=sched_sedf_domain_set(gc, domid, scinfo);
> +LOG(ERROR, "SEDF scheduler no longer available");
> +ret=ERROR_FEATURE_REMOVED;
>  break;
>  case LIBXL_SCHEDULER_CREDIT:
>  ret=sched_credit_domain_set(gc, domid, scinfo);
> @@ -5909,7 +5843,8 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, 
> uint32_t domid,
>  
>  switch (scinfo->sched) {
>  case LIBXL_SCHEDULER_SEDF:
> -ret=sched_sedf_domain_get(gc, domid, scinfo);
> +LOG(ERROR, "SEDF scheduler no longer available");
> +ret=ERROR_FEATURE_REMOVED;
>  break;
>  case LIBXL_SCHEDULER_CREDIT:
>  ret=sched_credit_domain_get(gc, domid, scinfo);
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 9c2303c..3f31a3b 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -50,61 +50,6 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc,
>  return 0;
>  }
>  
> -static int sched_params_valid(libxl__gc *gc,
> -  uint32_t domid, libxl_domain_sched_para

[Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP

2015-07-08 Thread Chao Peng
For AP, phys_proc_id is still not valid in CPU_PREPARE notifier
(cpu_smpboot_alloc), so cpu_to_socket(cpu) is not valid as well.

Introduce a pre-allocated secondary_cpu_mask so that later in
smp_store_cpu_info() socket_cpumask[socket] can consume it.

Signed-off-by: Chao Peng 
---
This is targeted for staging branch.
I tested on a 2-sockets machine and looks fine.
---
 xen/arch/x86/smpboot.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c73aa1b..49b8497 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -62,6 +62,7 @@ EXPORT_SYMBOL(cpu_online_map);
 
 unsigned int __read_mostly nr_sockets;
 cpumask_var_t *__read_mostly socket_cpumask;
+static cpumask_var_t secondary_socket_cpumask;
 
 struct cpuinfo_x86 cpu_data[NR_CPUS];
 
@@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
 static void smp_store_cpu_info(int id)
 {
 struct cpuinfo_x86 *c = cpu_data + id;
+unsigned int socket;
 
 *c = boot_cpu_data;
 if ( id != 0 )
+{
 identify_cpu(c);
 
+socket = cpu_to_socket(id);
+if ( !socket_cpumask[socket] )
+{
+socket_cpumask[socket] = secondary_socket_cpumask;
+secondary_socket_cpumask = NULL;
+}
+}
+
 /*
  * Certain Athlons might work (for various values of 'work') in SMP
  * but they are not certified as MP capable.
@@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 nodeid_t node = cpu_to_node(cpu);
 struct desc_struct *gdt;
 unsigned long stub_page;
-unsigned int socket = cpu_to_socket(cpu);
 
 if ( node != NUMA_NO_NODE )
 memflags = MEMF_node(node);
@@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 goto oom;
 per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
 
-if ( !socket_cpumask[socket] &&
- !zalloc_cpumask_var(socket_cpumask + socket) )
+if ( !secondary_socket_cpumask &&
+ !zalloc_cpumask_var(&secondary_socket_cpumask) )
 goto oom;
 
 if ( zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread David Vrabel
On 08/07/15 09:56, Jan Beulich wrote:
> Rather than assuming only PV guests need special treatment (and
> dealing with that directly when an IRQ gets set up), keep all guest MSI
> IRQs masked until either the (HVM) guest unmasks them via vMSI or the
> (PV, PVHVM, or PVH) guest sets up an event channel for it.
> 
> To not further clutter the common evtchn_bind_pirq() with x86-specific
> code, introduce an arch_evtchn_bind_pirq() hook instead.

Can you describe the symptoms of the bug being fixed here?

> --- a/xen/include/asm-arm/irq.h
> +++ b/xen/include/asm-arm/irq.h
> @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
>  
>  void arch_move_irqs(struct vcpu *v);
>  
> +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))

Would this be better as a inline function?

> +
>  /* Set IRQ type for an SPI */
>  int irq_set_spi_type(unsigned int spi, unsigned int type);
>  
> --- a/xen/include/xen/irq.h
> +++ b/xen/include/xen/irq.h
> @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
>  unsigned int arch_hwdom_irqs(domid_t);
>  #endif
>  
> +#ifndef arch_evtchn_bind_pirq
> +void arch_evtchn_bind_pirq(struct domain *, int pirq);

... moving this into xen/include/asm-x86/irq.h

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs

2015-07-08 Thread Chao Peng
On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote:
> On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote:
> > Chao Peng (13):
> >   x86: add socket_cpumask
> >   x86: detect and initialize Intel CAT feature
> >   x86: maintain COS to CBM mapping for each socket
> >   x86: add COS information for each domain
> >   x86: expose CBM length and COS number information
> >   x86: dynamically get/set CBM for a domain
> >   x86: add scheduling support for Intel CAT
> >   xsm: add CAT related xsm policies
> 
> Jan applied to here.
> 
> So I was going to apply these 5:
> 
> >   tools/libxl: minor name changes for CMT commands
> >   tools/libxl: add command to show PSR hardware info
> >   tools/libxl: introduce some socket helpers
> >   tools: add tools support for Intel CAT
> >   docs: add xl-psr.markdown
> 
> But, on i686 I see:
> 
> xl_cmdimpl.c: In function ‘psr_cat_hwinfo’:
> xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long 
> long unsigned int’, but argument 3 has type ‘long unsigned int’ 
> [-Werror=format=]
> (1ul << info->cbm_len) - 1);
> ^
> xl_cmdimpl.c: In function ‘psr_cat_print_socket’:
> xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long long 
> unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=]
>  printf("%-16s: %#"PRIx64"\n", "Default CBM", (1ul << info->cbm_len) - 1);
>  ^
> cc1: all warnings being treated as errors
> 
> It seems there is some mismatch between your types and the printf
> formats used.
> 
> The appropriate format specifier for an unsigned long (which you have
> from the "ul" in the constant) is %#lx and not "%#"PRIxXX which is
> associated with uintXX_t types.
> 
> If you need a 64 bit type then you might have meant instead to use "ull"
> in which case you want "%#llx" as the format specifier.

This is what I need. Thanks for suggestion.

Chao
> 
> If you really want/need an exactly 64 bit type then you'll have to do
> some nasty casting, something like "((uint64_t)1) << info->cbm_len) - 1"
> or something, that's pretty ugly though. If you have to go this route
> then please test both builds, in case I've gotten my ()'s wrong.
> 
> Ian.
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian , with one small comment:

> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec)
> +{
> +struct irq_desc *desc;
> +struct msi_desc *msi_desc;
> +int remap_index;
> +int rc = 0;
> +struct pci_dev *pci_dev;
> +struct acpi_drhd_unit *drhd;
> +struct iommu *iommu;
> +struct ir_ctrl *ir_ctrl;
> +struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +struct iremap_entry new_ire;
> +struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +unsigned long flags;
> +uint128_t old_ire, ret;
> +
> +desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +if ( !desc )
> +return -ENOMEM;

-EINVAL?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs

2015-07-08 Thread Wei Liu
On Wed, Jul 08, 2015 at 05:40:47PM +0800, Chao Peng wrote:
> On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote:
> > On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote:
> > > Chao Peng (13):
> > >   x86: add socket_cpumask
> > >   x86: detect and initialize Intel CAT feature
> > >   x86: maintain COS to CBM mapping for each socket
> > >   x86: add COS information for each domain
> > >   x86: expose CBM length and COS number information
> > >   x86: dynamically get/set CBM for a domain
> > >   x86: add scheduling support for Intel CAT
> > >   xsm: add CAT related xsm policies
> > 
> > Jan applied to here.
> > 
> > So I was going to apply these 5:
> > 
> > >   tools/libxl: minor name changes for CMT commands
> > >   tools/libxl: add command to show PSR hardware info
> > >   tools/libxl: introduce some socket helpers
> > >   tools: add tools support for Intel CAT
> > >   docs: add xl-psr.markdown
> > 
> > But, on i686 I see:
> > 
> > xl_cmdimpl.c: In function ‘psr_cat_hwinfo’:
> > xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long 
> > long unsigned int’, but argument 3 has type ‘long unsigned int’ 
> > [-Werror=format=]
> > (1ul << info->cbm_len) - 1);
> > ^
> > xl_cmdimpl.c: In function ‘psr_cat_print_socket’:
> > xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long 
> > long unsigned int’, but argument 3 has type ‘long unsigned int’ 
> > [-Werror=format=]
> >  printf("%-16s: %#"PRIx64"\n", "Default CBM", (1ul << info->cbm_len) - 
> > 1);
> >  ^
> > cc1: all warnings being treated as errors
> > 
> > It seems there is some mismatch between your types and the printf
> > formats used.
> > 
> > The appropriate format specifier for an unsigned long (which you have
> > from the "ul" in the constant) is %#lx and not "%#"PRIxXX which is
> > associated with uintXX_t types.
> > 
> > If you need a 64 bit type then you might have meant instead to use "ull"
> > in which case you want "%#llx" as the format specifier.
> 
> This is what I need. Thanks for suggestion.
> 

Chao, 4.6 freeze is on Friday. Can you fix that minor bug and
repost your series within two days?

Wei.

> Chao
> > 
> > If you really want/need an exactly 64 bit type then you'll have to do
> > some nasty casting, something like "((uint64_t)1) << info->cbm_len) - 1"
> > or something, that's pretty ugly though. If you have to go this route
> > then please test both builds, in case I've gotten my ()'s wrong.
> > 
> > Ian.
> > 
> > 
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d -> ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Sander Eikelenboom

Wednesday, July 8, 2015, 10:58:02 AM, you wrote:

> On 08/07/2015 09:45, Sander Eikelenboom wrote:
>> Monday, July 6, 2015, 11:33:09 AM, you wrote:
>>
>> On 26.06.15 at 17:57,  wrote:
 On 2015-06-26 17:51, Jan Beulich wrote:
 On 26.06.15 at 17:41,  wrote:
>> from 3.16 to 3.19 we gained a lot of these, if i remember correctly
>> related to
>> perf being enabled in the kernel:
>>
>> +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
>> 0xe023e008 to 0x00230010.
>> +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
>> 0x82d0b000 to 0x81bc2670.
>> +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
>> 0x82d0b020 to 0x81bc4630.
> These are the SYSCALL (STAR) MSRs, which the kernel has no business
> touching when running on Xen.
>
>> from 3.19 to 4.0 we gained:
>> +   d0 attempted to change d0v0's CR4 flags 0660 -> 0760
>> +   d0 attempted to change d0v1's CR4 flags 0660 -> 0760
>> +   d0 attempted to change d0v2's CR4 flags 0660 -> 0760
>> +   d0 attempted to change d0v3's CR4 flags 0660 -> 0760
>> +   d0 attempted to change d0v4's CR4 flags 0660 -> 0760
>> +   d0 attempted to change d0v5's CR4 flags 0660 -> 0760
> This is X86_CR4_PCE - not sure how to properly handle that.
> Andrew, you're fiddling with the CR4 handling right now anyway -
> any thoughts?
>
>> and from 4.0 to 4.1 we gained the ones you were interested in:
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
> For these to be meaningful you need to translate them to symbolic
> addresses. (And yes, we should see to make the code print them
> in a more useful manner.)
 How ?
>>> addr2line against xen-syms (or xen.efi if you use that one). And of
>>> course the result may need manual adjustment to account for
>>> eventual patches you have in your tree.
>>> Jan
>> Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
>> instead of xen, so running it against vmlinux of course lead no where.
>>
>> Here we go:
>>
>> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
>> -> 82d080239d85
>> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
>> -> 82d080239d85
>>
>> which leads to:
>> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
>> /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758
>>
>> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
>> ??:?

> The second one is not.  It is the fixup label, which will be hidden away
> out-of-line, and lacking debug symbols.

>>
>> Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:
>>
>> case MSR_EFER:
>>  rdmsr_normal:
>> /* Everyone can read the MSR space. */
>> /* gdprintk(XENLOG_WARNING,"Domain attempted RDMSR %p.\n",
>> _p(regs->ecx));*/
>> HERE -->if ( rdmsr_safe(regs->ecx, val) )
>> goto fail;

> Moving the printk into the fail case will identify which is the
> problematic MSR.  We need the value of regs->_ecx here (the low 32bits,
> not the full 64 as the commented printk currently has).

> I have a small todo list of misc debugging improvements.  I will add
> this to the list.

> ~Andrew

>>  rdmsr_writeback:
>> regs->eax = (uint32_t)val;
>> regs->edx = (uint32_t)(val >> 32);
>> break;
>> }
>> break;
>>

Don't know if the full 64bits is of equal use, but here it is:

(XEN) [2015-07-08 10:01:58.717] traps.c:2760:d14v0 Domain attempted but failed 
RDMSR 0570.


--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 5:06 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > Currently, we don't support urgent interrupt, all interrupts
> > are recognized as non-urgent interrupt, so we cannot send
> > posted-interrupt when 'SN' is set.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > use cmpxchg to test SN/ON and set ON
> >
> >  xen/arch/x86/hvm/vmx/vmx.c | 32 
> >  1 file changed, 28 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 0837627..b94ef6a 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct
> vcpu *v)
> >
> >  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> >  {
> > +struct pi_desc old, new, prev;
> > +
> 
> move to 'else if'.
> 
> >  if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> >  return;
> >
> > @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu
> *v, u8
> > vector)
> >   */
> >  pi_set_on(&v->arch.hvm_vmx.pi_desc);
> >  }
> > -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> > +else
> >  {
> > +prev.control = 0;
> > +
> > +do {
> > +old.control = v->arch.hvm_vmx.pi_desc.control &
> > +  ~(1 << POSTED_INTR_ON | 1 <<
> POSTED_INTR_SN);
> > +new.control = v->arch.hvm_vmx.pi_desc.control |
> > +  1 << POSTED_INTR_ON;
> > +
> > +/*
> > + * Currently, we don't support urgent interrupt, all
> > + * interrupts are recognized as non-urgent interrupt,
> > + * so we cannot send posted-interrupt when 'SN' is set.
> > + * Besides that, if 'ON' is already set, we cannot set
> > + * posted-interrupts as well.
> > + */
> > +if ( prev.sn || prev.on )
> > +{
> > +vcpu_kick(v);
> > +return;
> > +}
> 
> would it make more sense to move above check after cmpxchg?

My original idea is that, we only need to do the check when
prev.control != old.control, which means the cmpxchg is not
successful completed. If we add the check between cmpxchg
and while ( prev.control != old.control ), it seems the logic is
not so clear, since we don't need to check prev.sn and prev.on
when cmxchg succeeds in setting the new value.

Thanks,
Feng

> 
> > +
> > +prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
> > +   old.control, new.control);
> > +} while ( prev.control != old.control );
> > +
> >  __vmx_deliver_posted_interrupt(v);
> > -return;
> >  }
> > -
> > -vcpu_kick(v);
> >  }
> >
> >  static void vmx_sync_pir_to_irr(struct vcpu *v)
> > --
> > 2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 6:00 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > Signed-off-by: Feng Wu 
> 
> Acked-by: Kevin Tian , with one small comment:
> 
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec)
> > +{
> > +struct irq_desc *desc;
> > +struct msi_desc *msi_desc;
> > +int remap_index;
> > +int rc = 0;
> > +struct pci_dev *pci_dev;
> > +struct acpi_drhd_unit *drhd;
> > +struct iommu *iommu;
> > +struct ir_ctrl *ir_ctrl;
> > +struct iremap_entry *iremap_entries = NULL, *p = NULL;
> > +struct iremap_entry new_ire;
> > +struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +unsigned long flags;
> > +uint128_t old_ire, ret;
> > +
> > +desc = pirq_spin_lock_irq_desc(pirq, NULL);
> > +if ( !desc )
> > +return -ENOMEM;
> 
> -EINVAL?
> 

I think -EINVAL is reasonable.

Thanks,
Feng


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.

2015-07-08 Thread Tim Deegan
Hi,

At 17:38 + on 07 Jul (1436290689), Sahita, Ravi wrote:
> In order to make forward progress, do the other maintainers (Jan,
> Andrew, Tim) agree with the patch direction that George has
> suggested for this particular patch?

I'm no longer a maintainer for this code, but FWIW I think that this
direction (adding a new argument to the internal APIs rather than
adding new internal APIs) is correct.

Because the sve bit must be _set_ to get the old/default behaviour, I
think the p2m_pt implementation should always return sve = 1 on _get
and possibly also assert sve != 0 on _set.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version

2015-07-08 Thread Ian Campbell
On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote:
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index e1632fa..11f6461 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -369,6 +369,12 @@ libxl_vnode_info = Struct("vnode_info", [
>  ("vcpus", libxl_bitmap), # vcpus in this node
>  ])
>  
> +libxl_gic_version = Enumeration("gic_version", [
> +(0, "DEFAULT"),
> +(0x20, "v2"),
> +(0x30, "v3")
> +], init_val = "LIBXL_GIC_VERSION_DEFAULT")
> +
>  libxl_domain_build_info = Struct("domain_build_info",[
>  ("max_vcpus",   integer),
>  ("avail_vcpus", libxl_bitmap),
> @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct("domain_build_info",[
>])),
>   ("invalid", None),
>   ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
> +
> +
> +("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
> +  ])),
> +
>  ], dir=DIR_IN

This results in the following when building the ocaml bindings:

Traceback (most recent call last):
  File "genwrap.py", line 529, in 
ml.write(gen_ocaml_ml(ty, False))
  File "genwrap.py", line 217, in gen_ocaml_ml
s += gen_struct(ty)
  File "genwrap.py", line 119, in gen_struct
x = ocaml_instance_of_field(f)
  File "genwrap.py", line 112, in ocaml_instance_of_field
return "%s : %s" % (munge_name(name), ocaml_type_of(f.type))
  File "genwrap.py", line 90, in ocaml_type_of
return ty.rawname.capitalize() + ".t"
AttributeError: 'NoneType' object has no attribute 'capitalize'
make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 
'xenlight.ml'.  Stop.

I'll take a look.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d -> ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Andrew Cooper
On 08/07/2015 11:04, Sander Eikelenboom wrote:
> Wednesday, July 8, 2015, 10:58:02 AM, you wrote:
>
>> On 08/07/2015 09:45, Sander Eikelenboom wrote:
>>> Monday, July 6, 2015, 11:33:09 AM, you wrote:
>>>
>>> On 26.06.15 at 17:57,  wrote:
> On 2015-06-26 17:51, Jan Beulich wrote:
> On 26.06.15 at 17:41,  wrote:
>>> from 3.16 to 3.19 we gained a lot of these, if i remember correctly
>>> related to
>>> perf being enabled in the kernel:
>>>
>>> +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
>>> 0xe023e008 to 0x00230010.
>>> +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
>>> 0x82d0b000 to 0x81bc2670.
>>> +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
>>> 0x82d0b020 to 0x81bc4630.
>> These are the SYSCALL (STAR) MSRs, which the kernel has no business
>> touching when running on Xen.
>>
>>> from 3.19 to 4.0 we gained:
>>> +   d0 attempted to change d0v0's CR4 flags 0660 -> 0760
>>> +   d0 attempted to change d0v1's CR4 flags 0660 -> 0760
>>> +   d0 attempted to change d0v2's CR4 flags 0660 -> 0760
>>> +   d0 attempted to change d0v3's CR4 flags 0660 -> 0760
>>> +   d0 attempted to change d0v4's CR4 flags 0660 -> 0760
>>> +   d0 attempted to change d0v5's CR4 flags 0660 -> 0760
>> This is X86_CR4_PCE - not sure how to properly handle that.
>> Andrew, you're fiddling with the CR4 handling right now anyway -
>> any thoughts?
>>
>>> and from 4.0 to 4.1 we gained the ones you were interested in:
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>>> +   traps.c:3227: GPF (): 82d080194a4d -> 82d080239d85
>> For these to be meaningful you need to translate them to symbolic
>> addresses. (And yes, we should see to make the code print them
>> in a more useful manner.)
> How ?
 addr2line against xen-syms (or xen.efi if you use that one). And of
 course the result may need manual adjustment to account for
 eventual patches you have in your tree.
 Jan
>>> Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
>>> instead of xen, so running it against vmlinux of course lead no where.
>>>
>>> Here we go:
>>>
>>> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
>>> -> 82d080239d85
>>> (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
>>> -> 82d080239d85
>>>
>>> which leads to:
>>> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
>>> /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758
>>>
>>> # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
>>> ??:?
>> The second one is not.  It is the fixup label, which will be hidden away
>> out-of-line, and lacking debug symbols.
>>> Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:
>>>
>>> case MSR_EFER:
>>>  rdmsr_normal:
>>> /* Everyone can read the MSR space. */
>>> /* gdprintk(XENLOG_WARNING,"Domain attempted RDMSR %p.\n",
>>> _p(regs->ecx));*/
>>> HERE -->if ( rdmsr_safe(regs->ecx, val) )
>>> goto fail;
>> Moving the printk into the fail case will identify which is the
>> problematic MSR.  We need the value of regs->_ecx here (the low 32bits,
>> not the full 64 as the commented printk currently has).
>> I have a small todo list of misc debugging improvements.  I will add
>> this to the list.
>> ~Andrew
>>>  rdmsr_writeback:
>>> regs->eax = (uint32_t)val;
>>> regs->edx = (uint32_t)(val >> 32);
>>> break;
>>> }
>>> break;
>>>
> Don't know if the full 64bits is of equal use

It is (just with an unhelpful quantity of zeroes)

> , but here it is:
>
> (XEN) [2015-07-08 10:01:58.717] traps.c:2760:d14v0 Domain attempted but 
> failed RDMSR 0570.

Looks to be  MSR_IA32_RTIT_CTL, which is part of the Intel Processor
Trace PMU driver (Linux/arch/x86/kernel/cpu/perf_event_intel_pt.c).  A
PV domain running on AMD absolutely shouldn't be attempting to read this.

It appears that pt_init() blindly probes the MSR without any
cpuid/vendor detection.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-07-08 Thread Tiejun Chen
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Reviewed-by: Kevin Tian 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Nothing is changed.


 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  &xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 00/16] Fix RMRR

2015-07-08 Thread Tiejun Chen
v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
  and based on our discussion, we won't expose "ignore" in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now "0" means "strict" and "1" means "relaxed", and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename that field, is_64bar, inside struct
  bars with flag, and then extend to also indicate if this bar is already
  allocated.

* Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(),
  and then replace malloc() with libxl__malloc(), and finally cleanup this 
fallout.
  libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

* The original patch #13 is sent out separately since actually this is not 
related
  to RMRR.

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
 

[Xen-devel] [v6][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-08 Thread Tiejun Chen
From: Jan Beulich 

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich 
CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Jan Beulich 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5:

* Nothing is changed.

v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm->used_entries < grdm->map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm->map.buffer,
+ grdm->used_entries,
+ &rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm->used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(&grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  &grdm);
+
+if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, &grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm->used_entries < grdm->map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm->map.buffer,
+grdm->used_entries,
+&rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm->used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 

[Xen-devel] [v6][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-08 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan 
CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Suravee Suthikulpanit 
CC: Aravind Gopalakrishnan 
CC: Ian Campbell 
CC: Stefano Stabellini 
CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  "0" means "strict" and "1" means "relaxed".

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 ---
 xen/drivers/passthrough/vtd/iommu.c | 40 +++--
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 58 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
"Cannot setup identity map d%d:%lx,"
" gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
 int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter to DT device. */
+rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0);
 
 if ( rc )
 goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..6e23fc6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
 return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u

[Xen-devel] [v6][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-08 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies,

Global RDM parameter:
rdm = "strategy=host,reserve=strict/relaxed"
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 90 
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 107 insertions(+)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..098ad36 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_TYPE  10
+#define STATE_RESERVE_FLAG  11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev->permissive = atoi(tok);
 }else if ( !strcmp(optkey, "seize") ) {
 pcidev->seize = atoi(tok);
+}else if ( !strcmp(optkey, "rdm_reserve") ) {
+if ( !strcmp(tok, "strict") ) {
+pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+} else if ( !strcmp(tok, "relaxed") ) {
+pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+  " flag: 'strict' or 'relaxed'.",
+ tok);
+goto parse_error;
+}
 }else{
 XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
 }
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_TYPE;
+*ptr = '\0';
+if (strcmp(tok, "strategy")) {
+XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_TYPE:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_FLAG;
+*ptr = '\0';
+if (!strcmp(tok, "host")) {
+rdm->strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, "Unknown RDM type option: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_FLAG:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, "reserve")) {
+XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, "strict")) {
+rdm->reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+} else if (!strcmp(tok, "relaxed")) {
+rdm->reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, "Unknown RDM property flag value: %s",
+ tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+default:
+

[Xen-devel] [v6][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-08 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Make this variable "rdm_mem_boundary_memkb" specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 091e80d..7f65975 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough L
 wiki page.
 
+=item B
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+"strict" policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+"relaxed" policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item B
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 7a0c57d..38a8c3a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info->u.hvm.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
 b_info->u.hvm.rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+if (b_info->u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info->u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f3c39a0..62ef120 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = &d_config->b_info;
-/*
- * Currently we fix this as 2G to guarantte how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(&args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 ret = libxl__domain_device_construct_rdm(gc, d_config,
- rdm_mem_boundary,
+ 
info->u.hvm.rdm_mem_boundary_memkb*1024,
  &args);
 if (ret) {
 LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9f3f669..a936b8b 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("ms_vm_genid",  libxl_ms_vm_genid),
("serial_list",  libxl_string_list),
("rdm", libxl_rdm_reserve),
+

[Xen-devel] [v6][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-08 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
"strict" policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
"relaxed" policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 264 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 298 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b884fa1..7a0c57d 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info->type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..d68ea89 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,270 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r <= 0);
+/* "0" means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, "Could not get reserved device memory maps.\n");
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+ uint64_t rdm_start, uint64_t rdm_size)
+{
+return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address

[Xen-devel] [v6][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-08 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v6:

* Nothing is changed.

v5:
 
* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c833290..095fb1d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
 if ( list_empty(&acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev->seg;
+bus = pdev->bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr->segment == seg &&
+ PCI_BUS(bdf) == bus &&
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr->scope.devices_cnt > 1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+   " cannot assign %04x:%02x:%02x.%u"
+   " with shared RMRR for Dom%d.\n",
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   d->domain_id);
+return -EPERM;
+}
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev->seg;
-bus = pdev->bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-08 Thread Tiejun Chen
When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++---
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info->low_mem_pgend - (cur_pci_mem_start >> PAGE_SHIFT),
+(1u << 16) - 1);
+if ( hvm_info->high_mem_pgend == 0 )
+hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
+hvm_info->low_mem_pgend -= nr_pages;
+printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
+   " for lowmem MMIO hole\n",
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+BUG();
+hvm_info->high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT0x1
+#define PCI_BAR_IS_ALLOCATED0x2
+uint32_t flag;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long cur_pci_mem_start = 0;
 
 const char *s;
 /*
@@ -222,7 +252,7 @@ void pci_setup(void)
 if ( i != nr_bars )
 memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
 
-bars[i].is_64bar = is_64bar;
+bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0;
 bars[

[Xen-devel] [v6][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-08 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 56f5911..c833290 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2245,11 +2245,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) &&
- !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev->seg;
 bus = pdev->bus;
-if ( is_usb_device(seg, bus, pdev->devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include 
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = "strategy=host,reserve=strict/relaxed"
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Some rename to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
* Don't expose "ignore" in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..091e80d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item B
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B
+
+Possible Bs are:
+
+=over 4
+
+=item B
+
+Currently there is only one valid type:
+
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If B is set to "host", for example,
+
+rdm = "strategy=host,reserve=strict" or rdm = "strategy=host,reserve=relaxed"
+
+It means all conflicts will be handled according to the policy
+introduced by B as described below.
+
+#2. If B is not set at all, but
+
+pci = [ 'sbdf_A, rdm_reserve=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by B as described inside pci options.
+
+=item B
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+"strict" means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+"relaxed" allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden by rdm_reserve option in PCI device configuration.
+
+=back
+
+=back
+
 =item B
 
 Specifies the host PCI devices to passthrough to this guest. Each 
B
@@ -717,6 +790,14 @@ dom0 without confirmation.  Please use wit

[Xen-devel] [v6][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-08 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_internal.h | 24 +
 tools/libxl/libxl_x86.c  | 83 
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+LOG(ERROR, "setting domain memory map failed");
+goto out;
+}
+
 ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
&state->store_mfn, state->console_port,
&state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b4d8419..a50449a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc 
*gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
 const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+_hidden int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..be297b2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i < d_config->num_rdms; i++)
+if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries >= E820MAX) {
+LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low memory */
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].type = E820_RAM;
+nr++;
+
+/* RDM mapping */
+for (i = 0; i < d_config->num_rdms; i++) {
+if (d_config

[Xen-devel] [v6][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-08 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan 
CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Yang Zhang 
CC: Kevin Tian 
Reviewed-by: Kevin Tian 
Reviewed-by: Tim Deegan 
Acked-by: George Dunlap 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m->domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   "Cannot setup identity map d%d:%lx,"
+   " gfn already mapped to %lx.\n",
+   d->domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn < end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn < end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-   IOMMUF_readable|IOMMUF_writable);
+int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
 if ( err )
 retur

[Xen-devel] [v6][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-08 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
Signed-off-by: Tiejun Chen 
Reviewed-by: Tim Deegan 
Reviewed-by: Kevin Tian 
Acked-by: Jan Beulich 
Acked-by: George Dunlap 
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd & MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd & MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-08 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
CC: David Scott 
Acked-by: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Fix the flag field as "0" to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc < 0 && (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret < 0)
failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..a4928c6 100644
--- a/tools/python/xen/lo

[Xen-devel] [v6][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-08 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i < nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf("Fail to setup memory map due to conflict");
+printf(" on dynamic reserved memory range.\n");
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include 
 #include 
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size > reserved_start) &&
+(start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_

[Xen-devel] [v6][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-08 Thread Tiejun Chen
Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.

 tools/firmware/hvmloader/e820.c | 80 +
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info->high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i < memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1 << 32);
-e820[nr].size =
-((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end < (2u << 20));
+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i < memory_map.nr_map; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM &&
+ low_mem_end > e820[i].addr && low_mem_end < end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i < memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM &&
+ e820[i].addr > (1ull << 32))
+e820[i].size += add_high_mem;
+}
+}
+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j < nr-1; j++ )
+{
+for ( i = j+1; i < nr; i++ )
+{
+if ( e820[j].addr > e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
 return nr;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 0/3] Vm_event memory introspection helpers

2015-07-08 Thread Razvan Cojocaru
Version 4 of the series addresses V3 reviews, and consists of:

[PATCH 1/3] xen/mem_access: Support for memory-content hiding
[PATCH 2/3] xen/vm_event: Support for guest-requested events
[PATCH 3/3] xen/vm_event: Deny register writes if refused by
vm_event reply

All the patches in this version have been acked by at least one
person. For [PATCH 3/3], Tamas has suggested that I move the
DENY logic from p2m.c to dedicated files, which I've done here.
Since this is simply a trivial move without any modifications
to the logic itself, I've kept both acks received for the patch;
George's ack should in any case not be an issue, as it only
concerned the mm parts which are unchanged, but if I shouldn't
have kept Jan's ack then please disregard it.

This version of the series assumes the patch "vm_event: Rename
MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE" that I've
submitted yesterday. I've not added that patch to this series
because I wanted it to be available for Tamas as well, as he's
working on a parallel series and I had hoped that this way
would be better than him having to wait for this whole series
to go in.


Thank you,
Razvan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-08 Thread Razvan Cojocaru
Deny register writes if a vm_client subscribed to mov_to_msr or
control register write events forbids them. Currently supported for
MSR, CR0, CR3 and CR4 events.

Signed-off-by: Razvan Cojocaru 
Acked-by: George Dunlap 
Acked-by: Jan Beulich 

---
Changes since V3:
 - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed
   the bit shift appropriately).
 - Moved the DENY vm_event response logic from p2m.c to newly
   added dedicated files for vm_event handling, as suggested
   by Tamas Lengyel.
---
 MAINTAINERS   |1 +
 xen/arch/x86/Makefile |1 +
 xen/arch/x86/domain.c |2 +
 xen/arch/x86/hvm/emulate.c|8 +--
 xen/arch/x86/hvm/event.c  |5 +-
 xen/arch/x86/hvm/hvm.c|  118 -
 xen/arch/x86/hvm/svm/nestedsvm.c  |   14 ++---
 xen/arch/x86/hvm/svm/svm.c|2 +-
 xen/arch/x86/hvm/vmx/vmx.c|   15 +++--
 xen/arch/x86/hvm/vmx/vvmx.c   |   18 +++---
 xen/arch/x86/vm_event.c   |   33 +++
 xen/common/vm_event.c |9 +++
 xen/include/asm-arm/vm_event.h|   12 
 xen/include/asm-x86/domain.h  |   18 +-
 xen/include/asm-x86/hvm/event.h   |9 ++-
 xen/include/asm-x86/hvm/support.h |9 +--
 xen/include/asm-x86/vm_event.h|8 +++
 xen/include/public/vm_event.h |6 ++
 18 files changed, 242 insertions(+), 46 deletions(-)
 create mode 100644 xen/arch/x86/vm_event.c
 create mode 100644 xen/include/asm-arm/vm_event.h
 create mode 100644 xen/include/asm-x86/vm_event.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6b1068e..59c0822 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -383,6 +383,7 @@ F:  xen/common/vm_event.c
 F: xen/common/mem_access.c
 F: xen/arch/x86/hvm/event.c
 F: xen/arch/x86/monitor.c
+F: xen/arch/x86/vm_event.c
 
 XENTRACE
 M: George Dunlap 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 37e547c..5f24951 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -60,6 +60,7 @@ obj-y += machine_kexec.o
 obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
+obj-y += vm_event.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..c688ab9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -678,6 +678,8 @@ void arch_domain_destroy(struct domain *d)
 cleanup_domain_irq_mapping(d);
 
 psr_free_rmid(d);
+
+xfree(d->arch.event_write_data);
 }
 
 void arch_domain_shutdown(struct domain *d)
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c6ccb1f..780adb4 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1389,14 +1389,14 @@ static int hvmemul_write_cr(
 switch ( reg )
 {
 case 0:
-return hvm_set_cr0(val);
+return hvm_set_cr0(val, 1);
 case 2:
 current->arch.hvm_vcpu.guest_cr[2] = val;
 return X86EMUL_OKAY;
 case 3:
-return hvm_set_cr3(val);
+return hvm_set_cr3(val, 1);
 case 4:
-return hvm_set_cr4(val);
+return hvm_set_cr4(val, 1);
 default:
 break;
 }
@@ -1417,7 +1417,7 @@ static int hvmemul_write_msr(
 uint64_t val,
 struct x86_emulate_ctxt *ctxt)
 {
-return hvm_msr_write_intercept(reg, val);
+return hvm_msr_write_intercept(reg, val, 1);
 }
 
 static int hvmemul_wbinvd(
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 17638ea..042e583 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t 
*req)
 return 1;
 }
 
-void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
+bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
 {
 struct arch_domain *currad = ¤t->domain->arch;
 unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
@@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, 
unsigned long old)
 
 hvm_event_traps(currad->monitor.write_ctrlreg_sync & ctrlreg_bitmask,
 &req);
+return 1;
 }
+
+return 0;
 }
 
 void hvm_event_msr(unsigned int msr, uint64_t value)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 536d1c8..abfca33 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -468,6 +469,35 @@ void hvm_do_resume(struct vcpu *v)
 }
 }
 
+if ( unlikely(d->arch.event_write_data) )
+{
+struct monitor_write_data *w = &d->arch.event_write_data[v->vcpu_id];
+
+if ( w->do_write.msr )
+{
+hvm_msr_write_intercept(w->msr, w->value, 0);
+w->do_write.msr = 0;
+}
+
+if ( w->do_write.cr0 )
+{
+hvm_set_cr0(w->cr0, 0);
+

[Xen-devel] [PATCH V4 1/3] xen/mem_access: Support for memory-content hiding

2015-07-08 Thread Razvan Cojocaru
This patch adds support for memory-content hiding, by modifying the
value returned by emulated instructions that read certain memory
addresses that contain sensitive data. The patch only applies to
cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have
been set to a vm_event response.

Signed-off-by: Razvan Cojocaru 
Acked-by: George Dunlap 

---
Changes since V3:
 - Renamed MEM_ACCESS_SET_EMUL_READ_DATA to
   VM_EVENT_FLAG_SET_EMUL_READ_DATA and updated its comment.
 - Removed xfree(v->arch.vm_event.emul_read_data) from
   free_vcpu_struct().
 - Returning X86EMUL_UNHANDLEABLE from hvmemul_cmpxchg() when
   !curr->arch.vm_event.emul_read_data.
 - Replaced in xmalloc_bytes() with xmalloc_array() in
   hvmemul_rep_outs_set_context().
 - Setting the rest of the buffer to zero in hvmemul_rep_movs()
   (no longer leaking heap contents).
 - No longer memset()ing the whole buffer before copy (just zeroing
   out the rest).
 - Moved hvmemul_ctxt->set_context = 0 to hvm_emulate_prepare() and
   removed hvm_emulate_one_set_context().
---
 tools/tests/xen-access/xen-access.c |2 +-
 xen/arch/x86/hvm/emulate.c  |  138 ++-
 xen/arch/x86/hvm/event.c|   50 ++---
 xen/arch/x86/mm/p2m.c   |   92 +--
 xen/common/domain.c |2 +
 xen/common/vm_event.c   |   23 ++
 xen/include/asm-x86/domain.h|2 +
 xen/include/asm-x86/hvm/emulate.h   |   10 ++-
 xen/include/public/vm_event.h   |   31 ++--
 9 files changed, 274 insertions(+), 76 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c 
b/tools/tests/xen-access/xen-access.c
index 12ab921..e6ca9ba 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -530,7 +530,7 @@ int main(int argc, char *argv[])
 break;
 case VM_EVENT_REASON_SOFTWARE_BREAKPOINT:
 printf("Breakpoint: rip=%016"PRIx64", gfn=%"PRIx64" (vcpu 
%d)\n",
-   req.regs.x86.rip,
+   req.data.regs.x86.rip,
req.u.software_breakpoint.gfn,
req.vcpu_id);
 
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index fe5661d..c6ccb1f 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -653,6 +653,31 @@ static int hvmemul_read(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt->set_context) )
+{
+struct vcpu *curr = current;
+unsigned int safe_bytes;
+
+if ( !curr->arch.vm_event.emul_read_data )
+return X86EMUL_UNHANDLEABLE;
+
+safe_bytes = min_t(unsigned int,
+   bytes, curr->arch.vm_event.emul_read_data->size);
+
+if ( safe_bytes )
+{
+memcpy(p_data, curr->arch.vm_event.emul_read_data->data, 
safe_bytes);
+
+if ( bytes > safe_bytes )
+memset(p_data + safe_bytes, 0, bytes - safe_bytes);
+}
+
+return X86EMUL_OKAY;
+}
+
 return __hvmemul_read(
 seg, offset, p_data, bytes, hvm_access_read,
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
@@ -893,6 +918,28 @@ static int hvmemul_cmpxchg(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt->set_context) )
+{
+struct vcpu *curr = current;
+
+if ( curr->arch.vm_event.emul_read_data )
+{
+unsigned int safe_bytes = min_t(unsigned int, bytes,
+curr->arch.vm_event.emul_read_data->size);
+
+memcpy(p_new, curr->arch.vm_event.emul_read_data->data,
+   safe_bytes);
+
+if ( bytes > safe_bytes )
+memset(p_new + safe_bytes, 0, bytes - safe_bytes);
+}
+else
+return X86EMUL_UNHANDLEABLE;
+}
+
 /* Fix this in case the guest is really relying on r-m-w atomicity. */
 return hvmemul_write(seg, offset, p_new, bytes, ctxt);
 }
@@ -935,6 +982,43 @@ static int hvmemul_rep_ins(
!!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
 }
 
+static int hvmemul_rep_outs_set_context(
+enum x86_segment src_seg,
+unsigned long src_offset,
+uint16_t dst_port,
+unsigned int bytes_per_rep,
+unsigned long *reps,
+struct x86_emulate_ctxt *ctxt)
+{
+unsigned int bytes = *reps * bytes_per_rep;
+struct vcpu *curr = current;
+unsigned int safe_bytes;
+char *buf = NULL;
+int rc;
+
+if ( !curr->arch.vm_event.emul_read_data )
+return X86EMUL_UNHANDLEABLE;
+
+buf = xmalloc_array(char, bytes);
+
+if ( buf == NULL )
+return X86EMUL_UNHANDLEABLE;
+
+ 

[Xen-devel] [PATCH V4 2/3] xen/vm_event: Support for guest-requested events

2015-07-08 Thread Razvan Cojocaru
Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
sent via HVMOP_request_vm_event. The guest can request that a
generic vm_event (containing only the vm_event-filled guest registers
as information) be sent to userspace by setting up the correct
registers and doing a VMCALL. For example, for a 32-bit guest, this
means: EAX = 34 (hvmop), EBX = 24 (HVMOP_guest_request_vm_event),
ECX = 0 (NULL required for the hypercall parameter, reserved).

Signed-off-by: Razvan Cojocaru 
Acked-by: Tamas K Lengyel 
Acked-by: Wei Liu 
Acked-by: Jan Beulich 

---
Changes since V3:
 - None, just addded acks.
---
 tools/libxc/include/xenctrl.h   |2 ++
 tools/libxc/xc_monitor.c|   15 +++
 xen/arch/x86/hvm/event.c|   16 
 xen/arch/x86/hvm/hvm.c  |8 +++-
 xen/arch/x86/monitor.c  |   16 
 xen/include/asm-x86/domain.h|   16 +---
 xen/include/asm-x86/hvm/event.h |1 +
 xen/include/public/domctl.h |6 ++
 xen/include/public/hvm/hvm_op.h |2 ++
 xen/include/public/vm_event.h   |2 ++
 10 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..4ce519a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2384,6 +2384,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t 
domain_id, bool enable,
 int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable);
 int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id,
bool enable);
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id,
+ bool enable, bool sync);
 
 /***
  * Memory sharing operations.
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 63013de..d979122 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -105,3 +105,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t 
domain_id,
 
 return do_domctl(xch, &domctl);
 }
+
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, bool enable,
+ bool sync)
+{
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_monitor_op;
+domctl.domain = domain_id;
+domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+: XEN_DOMCTL_MONITOR_OP_DISABLE;
+domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST;
+domctl.u.monitor_op.u.guest_request.sync = sync;
+
+return do_domctl(xch, &domctl);
+}
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 5341937..17638ea 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -126,6 +126,22 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
 hvm_event_traps(1, &req);
 }
 
+void hvm_event_guest_request(void)
+{
+struct vcpu *curr = current;
+struct arch_domain *currad = &curr->domain->arch;
+
+if ( currad->monitor.guest_request_enabled )
+{
+vm_event_request_t req = {
+.reason = VM_EVENT_REASON_GUEST_REQUEST,
+.vcpu_id = curr->vcpu_id,
+};
+
+hvm_event_traps(currad->monitor.guest_request_sync, &req);
+}
+}
+
 int hvm_event_int3(unsigned long gla)
 {
 int rc = 0;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..536d1c8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5974,7 +5974,6 @@ static int hvmop_get_param(
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
 unsigned long start_iter, mask;
 long rc = 0;
@@ -6388,6 +6387,13 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 break;
 }
 
+case HVMOP_guest_request_vm_event:
+if ( guest_handle_is_null(arg) )
+hvm_event_guest_request();
+else
+rc = -EINVAL;
+break;
+
 default:
 {
 gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 896acf7..f8df7d2 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -161,6 +161,22 @@ int monitor_domctl(struct domain *d, struct 
xen_domctl_monitor_op *mop)
 break;
 }
 
+case XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST:
+{
+bool_t status = ad->monitor.guest_request_enabled;
+
+rc = status_check(mop, status);
+if ( rc )
+return rc;
+
+ad->monitor.guest_request_sync = mop->u.guest_request.sync;
+
+domain_pause(d);
+ad->monitor.guest_request_enabled = !status;
+domain_unpause(d);
+break;
+}
+
 default:
 return -EOPNOTSUPP;
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 7908844..f712caa 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -346,13

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
> 
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> 
> For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> still use interrupt remapping.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> - Use bitmap to store the all the possible destination vCPUs of an
> interrupt, then trying to find the right destination from the bitmap
> - Typo and some small changes
> 
>  xen/drivers/passthrough/io.c | 96
> +++-
>  1 file changed, 95 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 9b77334..18e24e1 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
> 
> @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>  xfree(dpci);
>  }
> 
> +/*
> + * The purpose of this routine is to find the right destination vCPU for
> + * an interrupt which will be delivered by VT-d posted-interrupt. There
> + * are several cases as below:

If you aim to have this interface common to more usages, don't restrict to
VT-d posted-interrupt which should be just an example.

> + *
> + * - For lowest-priority interrupts, we find the destination vCPU from the
> + *   guest vector using vector-hashing mechanism and return true. This 
> follows
> + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> + *   handle the lowest-priority interrupt.

Does AMD use same hashing mechanism? Can this interface be reused by
other IOMMU type or it's an Intel specific implementation?

> + * - Otherwise, for single destination interrupt, it is straightforward to
> + *   find the destination vCPU and return true.
> + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> + *   so return false.
> + *
> + *   Here is the details about the vector-hashing mechanism:
> + *   1. For lowest-priority interrupts, store all the possible destination
> + *  vCPUs in an array.
> + *   2. Use "gvec % max number of destination vCPUs" to find the right
> + *  destination vCPU in the array for the lowest-priority interrupt.
> + */
> +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> +  uint8_t dest_mode, uint8_t 
> delivery_mode,
> +  uint8_t gvec)
> +{
> +unsigned long *dest_vcpu_bitmap = NULL;
> +unsigned int dest_vcpu_num = 0, idx = 0;
> +int size = (d->max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
> +struct vcpu *v, *dest = NULL;
> +int i;
> +
> +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
> +if ( !dest_vcpu_bitmap )
> +{
> +dprintk(XENLOG_G_INFO,
> +"dom%d: failed to allocate memory\n", d->domain_id);
> +return NULL;
> +}
> +
> +for_each_vcpu ( d, v )
> +{
> +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> +dest_id, dest_mode) )
> +continue;
> +
> +__set_bit(v->vcpu_id, dest_vcpu_bitmap);
> +dest_vcpu_num++;
> +}
> +
> +if ( delivery_mode == dest_LowestPrio )
> +{
> +if (  dest_vcpu_num != 0 )
> +{

Having 'idx=0' here is more readable than initializing it earlier.

> +for ( i = 0; i <= gvec % dest_vcpu_num; i++)
> +idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
> +idx--;
> +
> +BUG_ON(idx >= d->max_vcpus || idx < 0);

idx is unsigned int. can't <0

> +dest = d->vcpu[idx];
> +}
> +}
> +else if (  dest_vcpu_num == 1 )

a comment would be applausive to explain the condition means
fixed destination, while multicast/broadcast will have num as ZERO.

> +{
> +idx = find_first_bit(dest_vcpu_bitmap, d->max_vcpus);
> +BUG_ON(idx >= d->max_vcpus || idx < 0);
> +dest = d->vcpu[idx];
> +}
> +
> +xfree(dest_vcpu_bitmap);
> +
> +return dest;
> +}
> +
>  int pt_irq_create_bind(
>  struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
>  {
> @@ -257,7 +330,7 @@ int pt_irq_create_bind(
>  {
>  case PT_IRQ_TYPE_MSI:
>  {
> -uint8_t dest, dest_mode;
> +uint8_t dest, dest_mode, delivery_mode;
>  int dest_vcpu_id;
> 
>  if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -330,11 +403,32 @@ int 

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 6:23 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> changes
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > the destination vCPU. This follows the hardware behavior, since modern
> > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> >
> > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > still use interrupt remapping.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - Use bitmap to store the all the possible destination vCPUs of an
> > interrupt, then trying to find the right destination from the bitmap
> > - Typo and some small changes
> >
> >  xen/drivers/passthrough/io.c | 96
> > +++-
> >  1 file changed, 95 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index 9b77334..18e24e1 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >  xfree(dpci);
> >  }
> >
> > +/*
> > + * The purpose of this routine is to find the right destination vCPU for
> > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > + * are several cases as below:
> 
> If you aim to have this interface common to more usages, don't restrict to
> VT-d posted-interrupt which should be just an example.

Yes, making this a common interface should be better.

> 
> > + *
> > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > + *   guest vector using vector-hashing mechanism and return true. This
> follows
> > + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> > + *   handle the lowest-priority interrupt.
> 
> Does AMD use same hashing mechanism? Can this interface be reused by
> other IOMMU type or it's an Intel specific implementation?

I am not sure how AMD handle lowest-priority. Intel hardware guys told me
recent Intel hardware platform use this method to deliver lowest-priority
interrupts. What do you mean by "other IOMMU type"?

Thanks,
Feng

> 
> > + * - Otherwise, for single destination interrupt, it is straightforward to
> > + *   find the destination vCPU and return true.
> > + * - For multicast/broadcast vCPU, we cannot handle it via interrupt 
> > posting,
> > + *   so return false.
> > + *
> > + *   Here is the details about the vector-hashing mechanism:
> > + *   1. For lowest-priority interrupts, store all the possible destination
> > + *  vCPUs in an array.
> > + *   2. Use "gvec % max number of destination vCPUs" to find the right
> > + *  destination vCPU in the array for the lowest-priority interrupt.
> > + */
> > +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > +  uint8_t dest_mode, uint8_t
> delivery_mode,
> > +  uint8_t gvec)
> > +{
> > +unsigned long *dest_vcpu_bitmap = NULL;
> > +unsigned int dest_vcpu_num = 0, idx = 0;
> > +int size = (d->max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
> > +struct vcpu *v, *dest = NULL;
> > +int i;
> > +
> > +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
> > +if ( !dest_vcpu_bitmap )
> > +{
> > +dprintk(XENLOG_G_INFO,
> > +"dom%d: failed to allocate memory\n", d->domain_id);
> > +return NULL;
> > +}
> > +
> > +for_each_vcpu ( d, v )
> > +{
> > +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > +dest_id, dest_mode) )
> > +continue;
> > +
> > +__set_bit(v->vcpu_id, dest_vcpu_bitmap);
> > +dest_vcpu_num++;
> > +}
> > +
> > +if ( delivery_mode == dest_LowestPrio )
> > +{
> > +if (  dest_vcpu_num != 0 )
> > +{
> 
> Having 'idx=0' here is more readable than initializing it earlier.
> 
> > +for ( i = 0; i <= gvec % dest_vcpu_num; i++)
> > +idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus,
> idx) + 1;
> > +idx--;
> > +
> > +BUG_ON(idx >= d->max_vcpus || idx < 0);
> 
> idx is unsigned int. can't <0
> 
> > +dest = d->vcpu[idx

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Tuesday, June 30, 2015 1:07 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z;
> george.dun...@eu.citrix.com
> Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> 
> On 24/06/15 06:18, Feng Wu wrote:
> > This patch includes the following aspects:
> > - Add a global vector to wake up the blocked vCPU
> >   when an interrupt is being posted to it (This
> >   part was sugguested by Yang Zhang ).
> > - Adds a new per-vCPU tasklet to wakeup the blocked
> >   vCPU. It can be used in the case vcpu_unblock
> >   cannot be called directly.
> > - Define two per-cpu variables:
> >   * pi_blocked_vcpu:
> >   A list storing the vCPUs which were blocked on this pCPU.
> >
> >   * pi_blocked_vcpu_lock:
> >   The spinlock to protect pi_blocked_vcpu.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - This patch is generated by merging the following three patches in v2:
> >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
> >[RFC v2 10/15] vmx: Define two per-cpu variables
> >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
> > arch_vmx_struct'
> > - rename 'vcpu_wakeup_tasklet_handler' to
> 'pi_vcpu_wakeup_tasklet_handler'
> > - Make pi_wakeup_interrupt() static
> > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> >
> >  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
> >  xen/arch/x86/hvm/vmx/vmx.c | 54
> ++
> >  xen/include/asm-x86/hvm/hvm.h  |  1 +
> >  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
> >  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
> >  5 files changed, 68 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 11dc1b5..0c5ce3f 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
> >  if ( cpu_has_vmx_vpid )
> >  vpid_sync_all();
> >
> > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> >  return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b94ef6a..7db6009 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
> uint64_t *msr_content);
> >  static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> > + * can find which vCPU should be waken up.
> > + */
> > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> > +
> > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> > +{
> > +vcpu_unblock((struct vcpu *)arg);
> > +}
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >  if ( v->vcpu_id == 0 )
> >  v->arch.user_regs.eax = 1;
> >
> > +tasklet_init(
> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> > +pi_vcpu_wakeup_tasklet_handler,
> > +(unsigned long)v);
> 
> c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
> when injecting interrupts, and replaced a tasklet with a softirq to fix
> the scalability issue.
> 
> I would expect exactly the bottleneck to exist here.

I am still considering this comments. Jan, what is your opinion about this?

Thanks,
Feng

> 
> > +
> > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> >  return 0;
> >  }
> >
> >  static void vmx_vcpu_destroy(struct vcpu *v)
> >  {
> > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
> >  /*
> >   * There are cases that domain still remains in log-dirty mode when it
> is
> >   * about to be destroyed (ex, user types 'xl destroy '), in which
> case
> > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> >  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >  };
> >
> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +

[Xen-devel] "x86, arm: remove asm/spinlock.h from all architectures" removed x86's _raw_read_unlock()

2015-07-08 Thread Jan Beulich
David,

I'm afraid we'll need another fixup here, even if things build fine
despite the removal.

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 12:36,  wrote:
>> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> Sent: Tuesday, June 30, 2015 1:07 AM
>> On 24/06/15 06:18, Feng Wu wrote:
>> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >  if ( v->vcpu_id == 0 )
>> >  v->arch.user_regs.eax = 1;
>> >
>> > +tasklet_init(
>> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
>> > +pi_vcpu_wakeup_tasklet_handler,
>> > +(unsigned long)v);
>> 
>> c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
>> when injecting interrupts, and replaced a tasklet with a softirq to fix
>> the scalability issue.
>> 
>> I would expect exactly the bottleneck to exist here.
> 
> I am still considering this comments. Jan, what is your opinion about this?

"My opinion" here is that I expect you to respond to Andrew.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] "x86, arm: remove asm/spinlock.h from all architectures" removed x86's _raw_read_unlock()

2015-07-08 Thread David Vrabel
On 08/07/15 11:45, Jan Beulich wrote:
> David,
> 
> I'm afraid we'll need another fixup here, even if things build fine
> despite the removal.

Ah, we get a generic implementation instead.  Thanks for pointing this
out.  I'll fix it.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 11:07,  wrote:
> On 08/07/2015 09:56, Jan Beulich wrote:
>> --- a/xen/include/asm-arm/irq.h
>> +++ b/xen/include/asm-arm/irq.h
>> @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,
>>
>>   void arch_move_irqs(struct vcpu *v);
>>
>> +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
>> +
> 
> This addition is here in order to ensure that d and pirq are evaluated, 
> right?

Sure.

> If so, I didn't find it obvious to understand. Why didn't you use a 
> static inline? Or maybe add a comment explicitly say this is not 
> implemented.

A static inline could be used in this case, yes. But I see no
significant advantages. As to the comment - it is implemented,
it's just a no-op. And stating that it is a no-op would be
redundant with it obviously being so by looking at it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich
>>> On 08.07.15 at 11:39,  wrote:
> On 08/07/15 09:56, Jan Beulich wrote:
>> Rather than assuming only PV guests need special treatment (and
>> dealing with that directly when an IRQ gets set up), keep all guest MSI
>> IRQs masked until either the (HVM) guest unmasks them via vMSI or the
>> (PV, PVHVM, or PVH) guest sets up an event channel for it.
>> 
>> To not further clutter the common evtchn_bind_pirq() with x86-specific
>> code, introduce an arch_evtchn_bind_pirq() hook instead.
> 
> Can you describe the symptoms of the bug being fixed here?

Interrupts simply didn't get unmasked for PVHVM Linux guests.

>> --- a/xen/include/asm-arm/irq.h
>> +++ b/xen/include/asm-arm/irq.h
>> @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
>>  
>>  void arch_move_irqs(struct vcpu *v);
>>  
>> +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
> 
> Would this be better as a inline function?
> 
>> +
>>  /* Set IRQ type for an SPI */
>>  int irq_set_spi_type(unsigned int spi, unsigned int type);
>>  
>> --- a/xen/include/xen/irq.h
>> +++ b/xen/include/xen/irq.h
>> @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
>>  unsigned int arch_hwdom_irqs(domid_t);
>>  #endif
>>  
>> +#ifndef arch_evtchn_bind_pirq
>> +void arch_evtchn_bind_pirq(struct domain *, int pirq);
> 
> ... moving this into xen/include/asm-x86/irq.h

Oh, right, (also to Julien) - this is exactly the reason I do not want it
to be an inline function for ARM: I want the declaration here, not
replicated in every interested arch's header.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> This patch includes the following aspects:
> - Add a global vector to wake up the blocked vCPU
>   when an interrupt is being posted to it (This
>   part was sugguested by Yang Zhang ).
> - Adds a new per-vCPU tasklet to wakeup the blocked
>   vCPU. It can be used in the case vcpu_unblock
>   cannot be called directly.
> - Define two per-cpu variables:
>   * pi_blocked_vcpu:
>   A list storing the vCPUs which were blocked on this pCPU.
> 
>   * pi_blocked_vcpu_lock:
>   The spinlock to protect pi_blocked_vcpu.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> - This patch is generated by merging the following three patches in v2:
>[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>[RFC v2 10/15] vmx: Define two per-cpu variables
>[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> 
>  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
>  xen/arch/x86/hvm/vmx/vmx.c | 54
> ++
>  xen/include/asm-x86/hvm/hvm.h  |  1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
>  5 files changed, 68 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 11dc1b5..0c5ce3f 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
>  if ( cpu_has_vmx_vpid )
>  vpid_sync_all();
> 
> +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
>  return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b94ef6a..7db6009 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t
> *msr_content);
>  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
> 
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be waken up.
> + */
> +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> +{
> +vcpu_unblock((struct vcpu *)arg);
> +}
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  if ( v->vcpu_id == 0 )
>  v->arch.user_regs.eax = 1;
> 
> +tasklet_init(
> +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> +pi_vcpu_wakeup_tasklet_handler,
> +(unsigned long)v);
> +
> +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
>  return 0;
>  }
> 
>  static void vmx_vcpu_destroy(struct vcpu *v)
>  {
> +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
>  /*
>   * There are cases that domain still remains in log-dirty mode when it is
>   * about to be destroyed (ex, user types 'xl destroy '), in which 
> case
> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>  };
> 
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +struct arch_vmx_struct *vmx;
> +unsigned int cpu = smp_processor_id();
> +
> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
> +/*
> + * FIXME: The length of the list depends on how many
> + * vCPU is current blocked on this specific pCPU.
> + * This may hurt the interrupt latency if the list
> + * grows to too many entries.
> + */

let's go with this linked list first until a real issue is identified.

> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> +pi_blocked_vcpu_list)
> +if ( vmx->pi_desc.on )
> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);

Not sure where the vcpu is removed from the list (possibly in later patch).
But at least removing vcpu from the list at this point should be safe and 
right way to go. IIRC Andrew and other guys raised similar concern earlier. :-)

Thanks
Kevin


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Julien Grall



On 08/07/2015 11:55, Jan Beulich wrote:

On 08.07.15 at 11:07,  wrote:

On 08/07/2015 09:56, Jan Beulich wrote:

--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,

   void arch_move_irqs(struct vcpu *v);

+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+


This addition is here in order to ensure that d and pirq are evaluated,
right?


Sure.


If so, I didn't find it obvious to understand. Why didn't you use a
static inline? Or maybe add a comment explicitly say this is not
implemented.


A static inline could be used in this case, yes. But I see no
significant advantages. As to the comment - it is implemented,
it's just a no-op. And stating that it is a no-op would be
redundant with it obviously being so by looking at it.


It's not so obvious as I asked about it.

The first thing I saw was (d) + (pirq) and I though : "Why do we want to 
add a domain with a pirq?". I only see after the (void) and it just 
because I remembered we talked about similar case a year ago.


Having a comment doesn't hurt and help the comprehension.

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 7:00 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > This patch includes the following aspects:
> > - Add a global vector to wake up the blocked vCPU
> >   when an interrupt is being posted to it (This
> >   part was sugguested by Yang Zhang ).
> > - Adds a new per-vCPU tasklet to wakeup the blocked
> >   vCPU. It can be used in the case vcpu_unblock
> >   cannot be called directly.
> > - Define two per-cpu variables:
> >   * pi_blocked_vcpu:
> >   A list storing the vCPUs which were blocked on this pCPU.
> >
> >   * pi_blocked_vcpu_lock:
> >   The spinlock to protect pi_blocked_vcpu.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - This patch is generated by merging the following three patches in v2:
> >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
> >[RFC v2 10/15] vmx: Define two per-cpu variables
> >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
> > arch_vmx_struct'
> > - rename 'vcpu_wakeup_tasklet_handler' to
> 'pi_vcpu_wakeup_tasklet_handler'
> > - Make pi_wakeup_interrupt() static
> > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> >
> >  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
> >  xen/arch/x86/hvm/vmx/vmx.c | 54
> > ++
> >  xen/include/asm-x86/hvm/hvm.h  |  1 +
> >  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
> >  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
> >  5 files changed, 68 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 11dc1b5..0c5ce3f 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
> >  if ( cpu_has_vmx_vpid )
> >  vpid_sync_all();
> >
> > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> >  return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b94ef6a..7db6009 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
> uint64_t
> > *msr_content);
> >  static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> > + * can find which vCPU should be waken up.
> > + */
> > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> > +
> > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> > +{
> > +vcpu_unblock((struct vcpu *)arg);
> > +}
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >  if ( v->vcpu_id == 0 )
> >  v->arch.user_regs.eax = 1;
> >
> > +tasklet_init(
> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> > +pi_vcpu_wakeup_tasklet_handler,
> > +(unsigned long)v);
> > +
> > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> >  return 0;
> >  }
> >
> >  static void vmx_vcpu_destroy(struct vcpu *v)
> >  {
> > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
> >  /*
> >   * There are cases that domain still remains in log-dirty mode when it
> is
> >   * about to be destroyed (ex, user types 'xl destroy '), in which
> case
> > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> > vmx_function_table = {
> >  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >  };
> >
> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +
> > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> > +/*
> > + * FIXME: The length of the list depends on how many
> > + * vCPU is current blocked on this specific pCPU.
> > + * This may hurt the interrupt latency if the list
> > + * grows to too many entries.
> > + */
> 

Re: [Xen-devel] [v3 13/15] vmx: Properly handle notification event when vCPU is running

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> When a vCPU is running in Root mode and a notification event
> has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
> the current cpu, so the pending interrupt in PIRR will be
> synced to vIRR before VM-Exit in time.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Wu, Feng
> Sent: Wednesday, July 08, 2015 6:32 PM
> To: Tian, Kevin; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com; Wu, Feng
> Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> changes
> 
> 
> 
> > -Original Message-
> > From: Tian, Kevin
> > Sent: Wednesday, July 08, 2015 6:23 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> > Yang Z; george.dun...@eu.citrix.com
> > Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> > changes
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, June 24, 2015 1:18 PM
> > >
> > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > for direct-assigned devices, we need to update the associated IRTE
> > > with the new guest vector, so external interrupts from the assigned
> > > devices can be injected to guests without VM-Exit.
> > >
> > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > the destination vCPU. This follows the hardware behavior, since modern
> > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > >
> > > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > > still use interrupt remapping.
> > >
> > > Signed-off-by: Feng Wu 
> > > ---
> > > v3:
> > > - Use bitmap to store the all the possible destination vCPUs of an
> > > interrupt, then trying to find the right destination from the bitmap
> > > - Typo and some small changes
> > >
> > >  xen/drivers/passthrough/io.c | 96
> > > +++-
> > >  1 file changed, 95 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > index 9b77334..18e24e1 100644
> > > --- a/xen/drivers/passthrough/io.c
> > > +++ b/xen/drivers/passthrough/io.c
> > > @@ -26,6 +26,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > >
> > > @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> *dpci)
> > >  xfree(dpci);
> > >  }
> > >
> > > +/*
> > > + * The purpose of this routine is to find the right destination vCPU for
> > > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > > + * are several cases as below:
> >
> > If you aim to have this interface common to more usages, don't restrict to
> > VT-d posted-interrupt which should be just an example.
> 
> Yes, making this a common interface should be better.

Thinking about this a little more, this function itself is kind of restricted to
VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts,
it only handle lowest-priority and single destination interrupts. However, I
can make the vector-hashing logic as a separate function, which can be
used elsewhere.

Thanks,
Feng

> 
> >
> > > + *
> > > + * - For lowest-priority interrupts, we find the destination vCPU from 
> > > the
> > > + *   guest vector using vector-hashing mechanism and return true. This
> > follows
> > > + *   the hardware behavior, since modern Intel CPUs use vector hashing
> to
> > > + *   handle the lowest-priority interrupt.
> >
> > Does AMD use same hashing mechanism? Can this interface be reused by
> > other IOMMU type or it's an Intel specific implementation?
> 
> I am not sure how AMD handle lowest-priority. Intel hardware guys told me
> recent Intel hardware platform use this method to deliver lowest-priority
> interrupts. What do you mean by "other IOMMU type"?
> 
> Thanks,
> Feng
> 
> >
> > > + * - Otherwise, for single destination interrupt, it is straightforward 
> > > to
> > > + *   find the destination vCPU and return true.
> > > + * - For multicast/broadcast vCPU, we cannot handle it via interrupt
> posting,
> > > + *   so return false.
> > > + *
> > > + *   Here is the details about the vector-hashing mechanism:
> > > + *   1. For lowest-priority interrupts, store all the possible 
> > > destination
> > > + *  vCPUs in an array.
> > > + *   2. Use "gvec % max number of destination vCPUs" to find the right
> > > + *  destination vCPU in the array for the lowest-priority interrupt.
> > > + */
> > > +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > > +  uint8_t dest_mode, uint8_t
> > delivery_mode,
> > > +  uint8_t gvec)
> > > +{
> > > +unsigned long *dest_vcpu_bitmap = NULL;
> > > +unsigned int dest_vcpu_num = 0, idx = 0;
> > > +int size = (d->max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
> > > +struct vcpu *v, *dest = NULL;
> > > +int i;
> > > +
> > > +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
> > > +if ( !dest_vcpu_bitmap )
> > > +{
> > > +dprintk(XENLOG_G_INFO,

Re: [Xen-devel] [PATCH] xen: Use module_pci_driver() in platform pci driver.

2015-07-08 Thread David Vrabel
On 08/07/15 06:54, Rajat Jain wrote:
> Eliminate the module_init function by using module_pci_driver()

This is not equivalent since this adds a useless module_exit() function.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread David Vrabel
On 08/07/15 11:58, Jan Beulich wrote:
 On 08.07.15 at 11:39,  wrote:
>> On 08/07/15 09:56, Jan Beulich wrote:
>>> Rather than assuming only PV guests need special treatment (and
>>> dealing with that directly when an IRQ gets set up), keep all guest MSI
>>> IRQs masked until either the (HVM) guest unmasks them via vMSI or the
>>> (PV, PVHVM, or PVH) guest sets up an event channel for it.
>>>
>>> To not further clutter the common evtchn_bind_pirq() with x86-specific
>>> code, introduce an arch_evtchn_bind_pirq() hook instead.
>>
>> Can you describe the symptoms of the bug being fixed here?
> 
> Interrupts simply didn't get unmasked for PVHVM Linux guests.
> 
>>> --- a/xen/include/asm-arm/irq.h
>>> +++ b/xen/include/asm-arm/irq.h
>>> @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
>>>  
>>>  void arch_move_irqs(struct vcpu *v);
>>>  
>>> +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
>>
>> Would this be better as a inline function?
>>
>>> +
>>>  /* Set IRQ type for an SPI */
>>>  int irq_set_spi_type(unsigned int spi, unsigned int type);
>>>  
>>> --- a/xen/include/xen/irq.h
>>> +++ b/xen/include/xen/irq.h
>>> @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
>>>  unsigned int arch_hwdom_irqs(domid_t);
>>>  #endif
>>>  
>>> +#ifndef arch_evtchn_bind_pirq
>>> +void arch_evtchn_bind_pirq(struct domain *, int pirq);
>>
>> ... moving this into xen/include/asm-x86/irq.h
> 
> Oh, right, (also to Julien) - this is exactly the reason I do not want it
> to be an inline function for ARM: I want the declaration here, not
> replicated in every interested arch's header.

Ok.

FWIW, with this requirement I would (instead of the macros) add a weak
arch_evtchn_bind_pirq() that's a no-op.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> The basic idea here is:
> 1. When vCPU's state is RUNSTATE_running,
> - set 'NV' to 'Notification Vector'.
> - Clear 'SN' to accpet PI.
> - set 'NDST' to the right pCPU.
> 2. When vCPU's state is RUNSTATE_blocked,
> - set 'NV' to 'Wake-up Vector', so we can wake up the
>   related vCPU when posted-interrupt happens for it.
> - Clear 'SN' to accpet PI.
> 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> - Set 'SN' to suppress non-urgent interrupts.
>   (Current, we only support non-urgent interrupts)
> - Set 'NV' back to 'Notification Vector' if needed.
> 
> Signed-off-by: Feng Wu 

Acked-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 15/15] Add a command line parameter for VT-d posted-interrupts

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> Enable VT-d Posted-Interrupts and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> Remove the redundant "no intremp then no intpost" logic
> 
>  docs/misc/xen-command-line.markdown | 9 -
>  xen/drivers/passthrough/iommu.c | 4 +++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown
> b/docs/misc/xen-command-line.markdown
> index aa684c0..f8ec15f 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -875,6 +875,13 @@ debug hypervisor only).
>  >> Control the use of interrupt remapping (DMA remapping will always be 
> enabled
>  >> if IOMMU functionality is enabled).
> 
> +> `intpost`
> +
> +> Default: `true`
> +
> +>> Control the use of interrupt posting, interrupt posting is dependant on
> +>> interrupt remapping.

"Control the use of interrupt posting, which depends on the availability of 
interrupt remapping."

> +
>  > `qinval` (VT-d)
> 
>  > Default: `true`
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 597f676..e13251c 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -52,7 +52,7 @@ bool_t __read_mostly iommu_passthrough;
>  bool_t __read_mostly iommu_snoop = 1;
>  bool_t __read_mostly iommu_qinval = 1;
>  bool_t __read_mostly iommu_intremap = 1;
> -bool_t __read_mostly iommu_intpost;
> +bool_t __read_mostly iommu_intpost = 1;
>  bool_t __read_mostly iommu_hap_pt_share = 1;
>  bool_t __read_mostly iommu_debug;
>  bool_t __read_mostly amd_iommu_perdev_intremap = 1;
> @@ -97,6 +97,8 @@ static void __init parse_iommu_param(char *s)
>  iommu_qinval = val;
>  else if ( !strcmp(s, "intremap") )
>  iommu_intremap = val;
> +else if ( !strcmp(s, "intpost") )
> +iommu_intpost = val;
>  else if ( !strcmp(s, "debug") )
>  {
>  iommu_debug = val;
> --
> 2.1.0

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, July 08, 2015 6:11 PM
> > From: Tian, Kevin
> > Sent: Wednesday, July 08, 2015 5:06 PM
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, June 24, 2015 1:18 PM
> > >
> > > Currently, we don't support urgent interrupt, all interrupts
> > > are recognized as non-urgent interrupt, so we cannot send
> > > posted-interrupt when 'SN' is set.
> > >
> > > Signed-off-by: Feng Wu 
> > > ---
> > > v3:
> > > use cmpxchg to test SN/ON and set ON
> > >
> > >  xen/arch/x86/hvm/vmx/vmx.c | 32
> 
> > >  1 file changed, 28 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > index 0837627..b94ef6a 100644
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct
> > vcpu *v)
> > >
> > >  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> > >  {
> > > +struct pi_desc old, new, prev;
> > > +
> >
> > move to 'else if'.
> >
> > >  if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> > >  return;
> > >
> > > @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu
> > *v, u8
> > > vector)
> > >   */
> > >  pi_set_on(&v->arch.hvm_vmx.pi_desc);
> > >  }
> > > -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> > > +else
> > >  {
> > > +prev.control = 0;
> > > +
> > > +do {
> > > +old.control = v->arch.hvm_vmx.pi_desc.control &
> > > +  ~(1 << POSTED_INTR_ON | 1 <<
> > POSTED_INTR_SN);
> > > +new.control = v->arch.hvm_vmx.pi_desc.control |
> > > +  1 << POSTED_INTR_ON;
> > > +
> > > +/*
> > > + * Currently, we don't support urgent interrupt, all
> > > + * interrupts are recognized as non-urgent interrupt,
> > > + * so we cannot send posted-interrupt when 'SN' is set.
> > > + * Besides that, if 'ON' is already set, we cannot set
> > > + * posted-interrupts as well.
> > > + */
> > > +if ( prev.sn || prev.on )
> > > +{
> > > +vcpu_kick(v);
> > > +return;
> > > +}
> >
> > would it make more sense to move above check after cmpxchg?
> 
> My original idea is that, we only need to do the check when
> prev.control != old.control, which means the cmpxchg is not
> successful completed. If we add the check between cmpxchg
> and while ( prev.control != old.control ), it seems the logic is
> not so clear, since we don't need to check prev.sn and prev.on
> when cmxchg succeeds in setting the new value.
> 
> Thanks,
> Feng
> 

Then it'd be clearer if you move the check the start of the loop, so
you can avoid two additional reads when the prev.on/sn is set. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version

2015-07-08 Thread Ian Campbell
On Wed, 2015-07-08 at 11:17 +0100, Ian Campbell wrote:
> On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote:
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index e1632fa..11f6461 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -369,6 +369,12 @@ libxl_vnode_info = Struct("vnode_info", [
> >  ("vcpus", libxl_bitmap), # vcpus in this node
> >  ])
> >  
> > +libxl_gic_version = Enumeration("gic_version", [
> > +(0, "DEFAULT"),
> > +(0x20, "v2"),
> > +(0x30, "v3")
> > +], init_val = "LIBXL_GIC_VERSION_DEFAULT")
> > +
> >  libxl_domain_build_info = Struct("domain_build_info",[
> >  ("max_vcpus",   integer),
> >  ("avail_vcpus", libxl_bitmap),
> > @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct("domain_build_info",[
> >])),
> >   ("invalid", None),
> >   ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
> > +
> > +
> > +("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
> > +  ])),
> > +
> >  ], dir=DIR_IN
> 
> This results in the following when building the ocaml bindings:
> 
> Traceback (most recent call last):
>   File "genwrap.py", line 529, in 
> ml.write(gen_ocaml_ml(ty, False))
>   File "genwrap.py", line 217, in gen_ocaml_ml
> s += gen_struct(ty)
>   File "genwrap.py", line 119, in gen_struct
> x = ocaml_instance_of_field(f)
>   File "genwrap.py", line 112, in ocaml_instance_of_field
> return "%s : %s" % (munge_name(name), ocaml_type_of(f.type))
>   File "genwrap.py", line 90, in ocaml_type_of
> return ty.rawname.capitalize() + ".t"
> AttributeError: 'NoneType' object has no attribute 'capitalize'
> make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 
> 'xenlight.ml'.  Stop.
> 
> I'll take a look.

I have a patch to genwrap.py which results in the following diff to the
generate ml files for the anonymous sub-struct added by the IDL change
above.

Dave/Euan/Rob, is that idiomatic ocaml or is it possible to have
anonymous structs in ocaml like it is in C?

If there is a better/more usual way to do this would you mind supplying
me with the ocaml I should be aiming for please?

Ian.

--- tools/ocaml/libs/xl/_libxl_BACKUP_types.ml.in   2015-07-08 
11:22:35.0 +0100
+++ tools/ocaml/libs/xl/_libxl_types.ml.in  2015-07-08 12:25:56.0 
+0100
@@ -508,6 +508,17 @@ module Vnode_info = struct
external default : ctx -> unit -> t = "stub_libxl_vnode_info_init"
 end
 
+(* libxl_gic_version implementation *)
+type gic_version = 
+| GIC_VERSION_DEFAULT
+| GIC_VERSION_V2
+| GIC_VERSION_V3
+
+let string_of_gic_version = function
+   | GIC_VERSION_DEFAULT -> "DEFAULT"
+   | GIC_VERSION_V2 -> "V2"
+   | GIC_VERSION_V3 -> "V3"
+
 (* libxl_domain_build_info implementation *)
 module Domain_build_info = struct
 
@@ -566,6 +577,10 @@ module Domain_build_info = struct

type type__union = Hvm of type_hvm | Pv of type_pv | Invalid

+   type arch_arm__anon = {
+   gic_version : gic_version;
+   }
+   
type t =
{
max_vcpus : int;
@@ -607,6 +622,7 @@ module Domain_build_info = struct
ramdisk : string option;
device_tree : string option;
xl_type : type__union;
+   arch_arm : arch_arm__anon;
}
external default : ctx -> ?xl_type:domain_type -> unit -> t = 
"stub_libxl_domain_build_info_init"
 end




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, July 08, 2015 6:32 PM
> 
> 
> 
> > -Original Message-
> > From: Tian, Kevin
> > Sent: Wednesday, July 08, 2015 6:23 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> > Yang Z; george.dun...@eu.citrix.com
> > Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> > changes
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, June 24, 2015 1:18 PM
> > >
> > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > for direct-assigned devices, we need to update the associated IRTE
> > > with the new guest vector, so external interrupts from the assigned
> > > devices can be injected to guests without VM-Exit.
> > >
> > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > the destination vCPU. This follows the hardware behavior, since modern
> > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > >
> > > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > > still use interrupt remapping.
> > >
> > > Signed-off-by: Feng Wu 
> > > ---
> > > v3:
> > > - Use bitmap to store the all the possible destination vCPUs of an
> > > interrupt, then trying to find the right destination from the bitmap
> > > - Typo and some small changes
> > >
> > >  xen/drivers/passthrough/io.c | 96
> > > +++-
> > >  1 file changed, 95 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > index 9b77334..18e24e1 100644
> > > --- a/xen/drivers/passthrough/io.c
> > > +++ b/xen/drivers/passthrough/io.c
> > > @@ -26,6 +26,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > >
> > > @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> > >  xfree(dpci);
> > >  }
> > >
> > > +/*
> > > + * The purpose of this routine is to find the right destination vCPU for
> > > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > > + * are several cases as below:
> >
> > If you aim to have this interface common to more usages, don't restrict to
> > VT-d posted-interrupt which should be just an example.
> 
> Yes, making this a common interface should be better.
> 
> >
> > > + *
> > > + * - For lowest-priority interrupts, we find the destination vCPU from 
> > > the
> > > + *   guest vector using vector-hashing mechanism and return true. This
> > follows
> > > + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> > > + *   handle the lowest-priority interrupt.
> >
> > Does AMD use same hashing mechanism? Can this interface be reused by
> > other IOMMU type or it's an Intel specific implementation?
> 
> I am not sure how AMD handle lowest-priority. Intel hardware guys told me
> recent Intel hardware platform use this method to deliver lowest-priority
> interrupts. What do you mean by "other IOMMU type"?
> 

OS doesn't assume how vector hashing is done in hardware level. So it should
be fine to use Intel algorithm in this emulation path. However my point is just
about the comment " since modern Intel CPUs use vector hashing to handle 
the lowest-priority interrupt". It's not because Intel does so. It's the 
implementation option that you choose Intel algorithm here.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, July 08, 2015 7:05 PM
> 
> 
> > -Original Message-
> > From: Wu, Feng
> > Sent: Wednesday, July 08, 2015 6:32 PM
> > To: Tian, Kevin; xen-devel@lists.xen.org
> > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> > Yang Z; george.dun...@eu.citrix.com; Wu, Feng
> > Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> > changes
> >
> >
> >
> > > -Original Message-
> > > From: Tian, Kevin
> > > Sent: Wednesday, July 08, 2015 6:23 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> > > Yang Z; george.dun...@eu.citrix.com
> > > Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> > > changes
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, June 24, 2015 1:18 PM
> > > >
> > > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > > for direct-assigned devices, we need to update the associated IRTE
> > > > with the new guest vector, so external interrupts from the assigned
> > > > devices can be injected to guests without VM-Exit.
> > > >
> > > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > > the destination vCPU. This follows the hardware behavior, since modern
> > > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > > >
> > > > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > > > still use interrupt remapping.
> > > >
> > > > Signed-off-by: Feng Wu 
> > > > ---
> > > > v3:
> > > > - Use bitmap to store the all the possible destination vCPUs of an
> > > > interrupt, then trying to find the right destination from the bitmap
> > > > - Typo and some small changes
> > > >
> > > >  xen/drivers/passthrough/io.c | 96
> > > > +++-
> > > >  1 file changed, 95 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > > index 9b77334..18e24e1 100644
> > > > --- a/xen/drivers/passthrough/io.c
> > > > +++ b/xen/drivers/passthrough/io.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > > >
> > > > @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> > *dpci)
> > > >  xfree(dpci);
> > > >  }
> > > >
> > > > +/*
> > > > + * The purpose of this routine is to find the right destination vCPU 
> > > > for
> > > > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > > > + * are several cases as below:
> > >
> > > If you aim to have this interface common to more usages, don't restrict to
> > > VT-d posted-interrupt which should be just an example.
> >
> > Yes, making this a common interface should be better.
> 
> Thinking about this a little more, this function itself is kind of restricted 
> to
> VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts,
> it only handle lowest-priority and single destination interrupts. However, I
> can make the vector-hashing logic as a separate function, which can be
> used elsewhere.
> 

iommu_intpost is a general option, not VT-d specific. It's fine to keep this 
function here. My earlier comment is more about the accuracy of the code
comment above. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v25 05/15] x86/VPMU: Initialize VPMUs with __initcall

2015-07-08 Thread Dietmar Hahn
Am Freitag 19 Juni 2015, 14:44:36 schrieb Boris Ostrovsky:
> Move some VPMU initilization operations into __initcalls to avoid performing
> same tests and calculations for each vcpu.
> 
> Signed-off-by: Boris Ostrovsky 
> Acked-by: Jan Beulich 

For the Intel/VMX part:

Reviewed-by: Dietmar Hahn 

> ---
>  xen/arch/x86/hvm/svm/vpmu.c   | 106 --
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 
> +++---
>  xen/arch/x86/hvm/vpmu.c   |  32 
>  xen/include/asm-x86/hvm/vpmu.h|   2 +
>  4 files changed, 156 insertions(+), 135 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 481ea7b..b60ca40 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t 
> *msr_content)
>  return 1;
>  }
>  
> -static int amd_vpmu_initialise(struct vcpu *v)
> -{
> -struct xen_pmu_amd_ctxt *ctxt;
> -struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -uint8_t family = current_cpu_data.x86;
> -
> -if ( counters == NULL )
> -{
> - switch ( family )
> -  {
> -  case 0x15:
> -  num_counters = F15H_NUM_COUNTERS;
> -  counters = AMD_F15H_COUNTERS;
> -  ctrls = AMD_F15H_CTRLS;
> -  k7_counters_mirrored = 1;
> -  break;
> -  case 0x10:
> -  case 0x12:
> -  case 0x14:
> -  case 0x16:
> -  default:
> -  num_counters = F10H_NUM_COUNTERS;
> -  counters = AMD_F10H_COUNTERS;
> -  ctrls = AMD_F10H_CTRLS;
> -  k7_counters_mirrored = 0;
> -  break;
> -  }
> -}
> -
> -ctxt = xzalloc_bytes(sizeof(*ctxt) +
> - 2 * sizeof(uint64_t) * num_counters);
> -if ( !ctxt )
> -{
> -gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> -" PMU feature is unavailable on domain %d vcpu %d.\n",
> -v->vcpu_id, v->domain->domain_id);
> -return -ENOMEM;
> -}
> -
> -ctxt->counters = sizeof(*ctxt);
> -ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
> -
> -vpmu->context = ctxt;
> -vpmu->priv_context = NULL;
> -vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> -return 0;
> -}
> -
>  static void amd_vpmu_destroy(struct vcpu *v)
>  {
>  struct vpmu_struct *vpmu = vcpu_vpmu(v);
> @@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = {
>  
>  int svm_vpmu_initialise(struct vcpu *v)
>  {
> +struct xen_pmu_amd_ctxt *ctxt;
>  struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -uint8_t family = current_cpu_data.x86;
> -int ret = 0;
>  
> -/* vpmu enabled? */
>  if ( vpmu_mode == XENPMU_MODE_OFF )
>  return 0;
>  
> -switch ( family )
> +if ( !counters )
> +return -EINVAL;
> +
> +ctxt = xzalloc_bytes(sizeof(*ctxt) +
> + 2 * sizeof(uint64_t) * num_counters);
> +if ( !ctxt )
>  {
> +printk(XENLOG_G_WARNING "Insufficient memory for PMU, "
> +   " PMU feature is unavailable on domain %d vcpu %d.\n",
> +   v->vcpu_id, v->domain->domain_id);
> +return -ENOMEM;
> +}
> +
> +ctxt->counters = sizeof(*ctxt);
> +ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
> +
> +vpmu->context = ctxt;
> +vpmu->priv_context = NULL;
> +
> +vpmu->arch_vpmu_ops = &amd_vpmu_ops;
> +
> +vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> +return 0;
> +}
> +
> +int __init amd_vpmu_init(void)
> +{
> +switch ( current_cpu_data.x86 )
> +{
> +case 0x15:
> +num_counters = F15H_NUM_COUNTERS;
> +counters = AMD_F15H_COUNTERS;
> +ctrls = AMD_F15H_CTRLS;
> +k7_counters_mirrored = 1;
> +break;
>  case 0x10:
>  case 0x12:
>  case 0x14:
> -case 0x15:
>  case 0x16:
> -ret = amd_vpmu_initialise(v);
> -if ( !ret )
> -vpmu->arch_vpmu_ops = &amd_vpmu_ops;
> -return ret;
> +num_counters = F10H_NUM_COUNTERS;
> +counters = AMD_F10H_COUNTERS;
> +ctrls = AMD_F10H_CTRLS;
> +k7_counters_mirrored = 0;
> +break;
> +default:
> +printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n",
> +   current_cpu_data.x86);
> +return -EINVAL;
>  }
>  
> -printk("VPMU: Initialization failed. "
> -   "AMD processor family %d has not "
> -   "been supported\n", family);
> -return -EINVAL;
> +return 0;
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index cfcdf42..025c970 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
> *regs)
>  return 1;
>  }
>  
> -static int core2_vpmu_initialise(struct vcpu *v)
> -{
> -struct vpmu_struct *vpmu = vcpu_vpmu(

Re: [Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Jackson
Tiejun Chen writes ("[v6][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,

Thanks.


I appreciate that I have come to this review late.  While I have found
the review conversation quite unsatisfactory, I don't really feel that
I can reject the patch series pending better answers to my questions.

Instead, I feel that I need to make a set of decisions which will
avoid my review comments being a blocker for this series.  After
discussing matters with the other tools maintainers, I have concluded:


* On the question of whether the default should be `strategy=host' or
  `strategy=none':

  I still don't understand what is going on here and I am frustrated
  because I don't feel that the replies I have been getting are
  actually answers to my questions.  They seem to be answers to
  different questions.

  However, the patch series with `strategy=none' is strictly less of a
  change to the codebase than with `stategy=host' and it is easy to
  change defaults later.  It would be perverse to block this
  functionality on the grounds that it is not enabled strongly enough
  by default.

  Therefore, despite the fact that after several rounds of emails I
  still do not have a convincing explanation, I am going to drop this
  line of questioning.


* On the question of the documentation: The documentation is
  unfortunately a poor guide to a user.  Many of my questions were
  prompted by reading the documentation.  Having gone several rounds
  of emails I still do not know enough to suggest improvements.

  In my view the effect of the poor documentation will be that most
  users will simply ignore the whole feature as too confusing.
  (Unless they have somehow divined that they are having RDM trouble
  in which case they may flail at random experimenting with various
  options.)

  Again, the effect therefore is that knowledgeable users might be
  able to do better, but for most users this is just yet another piece
  of docs for some feature they don't want to use.

  While I'm not entirely comfortable with accepting documentation
  which reduces the overall readability and usefulness of the manual,
  I think this is a relatively minor objection which I am prepared to
  overlook.

  Of course there is some opportunity for improving the documentation
  during the freeze.


* On the question of option naming, `strategy' vs `type':

  `type' was definitely wrong.  It may be that a better name than
  `strategy' would be correct.  This depends on the contemplated
  direction for future expansion.

  Sadly, I do not expect that further discussion is going to
  illuminate this further.  `strategy' will do.


* On the question of option naming, `none' vs `ignore':

  I asked whether the submitter agreed that `none' should be renamed
  `ignore'.  I have not received a clear opinion.  Instead, the
  submitter indicated a willingness to change this on my request.  the
  latest resubmission just did the rename.

  The purpose of asking `do you agree', in this way, is to try to help
  the submitters and the maintainers come up with the best answers.

  Note that it is a fundamental assumption of the patch review process
  that the submitter understands the design and implementation
  decisions embodied in the patchset.  The submitter needs to be able
  to respond to suggestions with evaluations, not simply acquiescence.
  (If it happens that some of the decisions were made by someone else,
  the submitter needs to 1. state this clearly where relevant and
  2. either consult the designers/authors, or if they aren't
  available, reverse-engineer the intent.)

  In the absence of a clear statement of the submitter's own opinion,
  I remain doubtful that this rename was correct.  But, I don't think
  it important enough to make any more fuss about.


* On the question of option naming, the `reserve='.

  Ian Campbell points out that the API structure for `[rdm_]reserve'
  as submitted is anomalous.  I agree with him.  The existing
  API and config file arrangements are rather too confusing.

  Please change `reserve' to `policy', in the following places:

  * In the xl rdm config parsing, `reserve=' should be `policy='.
  * In the xl pci config parsing, `rdm_reserve=' should be
`rdm_policy='.
  * The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  * The field name `reserve' in `libxl_rdm_reserve' should be
`policy'.


I think that with these changes I will be able to ack the remaining
tools parts of this series, and drop my objections to the parts acked
by Wei.

I can't speak for the hypervisor side, which I haven't really looked
at.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 7:46 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> changes
> 
> > From: Wu, Feng
> > Sent: Wednesday, July 08, 2015 6:32 PM
> >
> >
> >
> > > -Original Message-
> > > From: Tian, Kevin
> > > Sent: Wednesday, July 08, 2015 6:23 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> > > Yang Z; george.dun...@eu.citrix.com
> > > Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
> > > changes
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, June 24, 2015 1:18 PM
> > > >
> > > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > > for direct-assigned devices, we need to update the associated IRTE
> > > > with the new guest vector, so external interrupts from the assigned
> > > > devices can be injected to guests without VM-Exit.
> > > >
> > > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > > the destination vCPU. This follows the hardware behavior, since modern
> > > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > > >
> > > > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > > > still use interrupt remapping.
> > > >
> > > > Signed-off-by: Feng Wu 
> > > > ---
> > > > v3:
> > > > - Use bitmap to store the all the possible destination vCPUs of an
> > > > interrupt, then trying to find the right destination from the bitmap
> > > > - Typo and some small changes
> > > >
> > > >  xen/drivers/passthrough/io.c | 96
> > > > +++-
> > > >  1 file changed, 95 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > > index 9b77334..18e24e1 100644
> > > > --- a/xen/drivers/passthrough/io.c
> > > > +++ b/xen/drivers/passthrough/io.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > > >
> > > > @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> *dpci)
> > > >  xfree(dpci);
> > > >  }
> > > >
> > > > +/*
> > > > + * The purpose of this routine is to find the right destination vCPU 
> > > > for
> > > > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > > > + * are several cases as below:
> > >
> > > If you aim to have this interface common to more usages, don't restrict to
> > > VT-d posted-interrupt which should be just an example.
> >
> > Yes, making this a common interface should be better.
> >
> > >
> > > > + *
> > > > + * - For lowest-priority interrupts, we find the destination vCPU from 
> > > > the
> > > > + *   guest vector using vector-hashing mechanism and return true. This
> > > follows
> > > > + *   the hardware behavior, since modern Intel CPUs use vector
> hashing to
> > > > + *   handle the lowest-priority interrupt.
> > >
> > > Does AMD use same hashing mechanism? Can this interface be reused by
> > > other IOMMU type or it's an Intel specific implementation?
> >
> > I am not sure how AMD handle lowest-priority. Intel hardware guys told me
> > recent Intel hardware platform use this method to deliver lowest-priority
> > interrupts. What do you mean by "other IOMMU type"?
> >
> 
> OS doesn't assume how vector hashing is done in hardware level. So it should
> be fine to use Intel algorithm in this emulation path. However my point is 
> just
> about the comment " since modern Intel CPUs use vector hashing to handle
> the lowest-priority interrupt". It's not because Intel does so. It's the
> implementation option that you choose Intel algorithm here.

here I can mention: we choose vector-hashing for lowest-priority handling and
list Intel as an example to use it, okay?

Thanks,
Feng

> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin
> From: Wu, Feng
> Sent: Wednesday, July 08, 2015 7:52 PM
> > > > > + * - For lowest-priority interrupts, we find the destination vCPU 
> > > > > from the
> > > > > + *   guest vector using vector-hashing mechanism and return true. 
> > > > > This
> > > > follows
> > > > > + *   the hardware behavior, since modern Intel CPUs use vector
> > hashing to
> > > > > + *   handle the lowest-priority interrupt.
> > > >
> > > > Does AMD use same hashing mechanism? Can this interface be reused by
> > > > other IOMMU type or it's an Intel specific implementation?
> > >
> > > I am not sure how AMD handle lowest-priority. Intel hardware guys told me
> > > recent Intel hardware platform use this method to deliver lowest-priority
> > > interrupts. What do you mean by "other IOMMU type"?
> > >
> >
> > OS doesn't assume how vector hashing is done in hardware level. So it should
> > be fine to use Intel algorithm in this emulation path. However my point is 
> > just
> > about the comment " since modern Intel CPUs use vector hashing to handle
> > the lowest-priority interrupt". It's not because Intel does so. It's the
> > implementation option that you choose Intel algorithm here.
> 
> here I can mention: we choose vector-hashing for lowest-priority handling and
> list Intel as an example to use it, okay?
> 

Yes. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Wu, Feng


> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 7:31 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set
> 
> > From: Wu, Feng
> > Sent: Wednesday, July 08, 2015 6:11 PM
> > > From: Tian, Kevin
> > > Sent: Wednesday, July 08, 2015 5:06 PM
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, June 24, 2015 1:18 PM
> > > >
> > > > Currently, we don't support urgent interrupt, all interrupts
> > > > are recognized as non-urgent interrupt, so we cannot send
> > > > posted-interrupt when 'SN' is set.
> > > >
> > > > Signed-off-by: Feng Wu 
> > > > ---
> > > > v3:
> > > > use cmpxchg to test SN/ON and set ON
> > > >
> > > >  xen/arch/x86/hvm/vmx/vmx.c | 32
> > 
> > > >  1 file changed, 28 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > > index 0837627..b94ef6a 100644
> > > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > > @@ -1686,6 +1686,8 @@ static void
> __vmx_deliver_posted_interrupt(struct
> > > vcpu *v)
> > > >
> > > >  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> > > >  {
> > > > +struct pi_desc old, new, prev;
> > > > +
> > >
> > > move to 'else if'.
> > >
> > > >  if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> > > >  return;
> > > >
> > > > @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct
> vcpu
> > > *v, u8
> > > > vector)
> > > >   */
> > > >  pi_set_on(&v->arch.hvm_vmx.pi_desc);
> > > >  }
> > > > -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> > > > +else
> > > >  {
> > > > +prev.control = 0;
> > > > +
> > > > +do {
> > > > +old.control = v->arch.hvm_vmx.pi_desc.control &
> > > > +  ~(1 << POSTED_INTR_ON | 1 <<
> > > POSTED_INTR_SN);
> > > > +new.control = v->arch.hvm_vmx.pi_desc.control |
> > > > +  1 << POSTED_INTR_ON;
> > > > +
> > > > +/*
> > > > + * Currently, we don't support urgent interrupt, all
> > > > + * interrupts are recognized as non-urgent interrupt,
> > > > + * so we cannot send posted-interrupt when 'SN' is set.
> > > > + * Besides that, if 'ON' is already set, we cannot set
> > > > + * posted-interrupts as well.
> > > > + */
> > > > +if ( prev.sn || prev.on )
> > > > +{
> > > > +vcpu_kick(v);
> > > > +return;
> > > > +}
> > >
> > > would it make more sense to move above check after cmpxchg?
> >
> > My original idea is that, we only need to do the check when
> > prev.control != old.control, which means the cmpxchg is not
> > successful completed. If we add the check between cmpxchg
> > and while ( prev.control != old.control ), it seems the logic is
> > not so clear, since we don't need to check prev.sn and prev.on
> > when cmxchg succeeds in setting the new value.
> >
> > Thanks,
> > Feng
> >
> 
> Then it'd be clearer if you move the check the start of the loop, so
> you can avoid two additional reads when the prev.on/sn is set. :-)

Good idea!

Thanks,
Feng

> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [libvirt] [PATCH] libxl: support dom0

2015-07-08 Thread Michal Privoznik
On 07.07.2015 01:27, Jim Fehlig wrote:
> On 07/06/2015 03:46 PM, Jim Fehlig wrote:
>> In Xen, dom0 is really just another domain that supports ballooning,
>> adding/removing devices, changing vcpu configuration, etc. This patch
>> adds support to the libxl driver for managing dom0. Note that the
>> legacy xend driver has long supported managing dom0.
>>
>> Operations that are not supported on dom0 are filtered in libvirt
>> where a sensible error is reported. Errors from libxl are not
>> always helpful. E.g., attempting a save on dom0 results in
>>
>> 2015-06-23 15:25:05 MDT libxl: debug:
>> libxl_dom.c:1570:libxl__toolstack_save: domain=0 toolstack data size=8
>> 2015-06-23 15:25:05 MDT libxl: debug:
>> libxl.c:979:do_libxl_domain_suspend: ao 0x7f7e68000b70: inprogress:
>> poller=0x7f7e68000930, flags=i
>> 2015-06-23 15:25:05 MDT libxl-save-helper: debug: starting save: Success
>> 2015-06-23 15:25:05 MDT xc: detail: xc_domain_save_suse: starting save
>> of domid 0
>> 2015-06-23 15:25:05 MDT xc: error: Couldn't map live_shinfo (3 = No
>> such process): Internal error
>> 2015-06-23 15:25:05 MDT xc: detail: Save exit of domid 0 with errno=3
>> 2015-06-23 15:25:05 MDT libxl-save-helper: debug: complete r=1: No
>> such process
>> 2015-06-23 15:25:05 MDT libxl: error:
>> libxl_dom.c:1876:libxl__xc_domain_save_done: saving domain: domain did
>> not respond to suspend request: No such process
>> 2015-06-23 15:25:05 MDT libxl: error:
>> libxl_dom.c:2033:remus_teardown_done: Remus: failed to teardown device
>> for guest with domid 0, rc -8
>>
>> Signed-off-by: Jim Fehlig 
>> ---
>>   src/libxl/libxl_driver.c | 95
>> 
>>   1 file changed, 95 insertions(+)
>>
>> diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
>> index 149ef70..d0b76ac 100644
>> --- a/src/libxl/libxl_driver.c
>> +++ b/src/libxl/libxl_driver.c
>> @@ -79,6 +79,15 @@ VIR_LOG_INIT("libxl.libxl_driver");
>>   /* Number of Xen scheduler parameters */
>>   #define XEN_SCHED_CREDIT_NPARAM   2
>>   +#define LIBXL_CHECK_DOM0_GOTO(name,
>> label)   \
>> +do
>> {  \
>> +if (STREQ_NULLABLE(name, "Domain-0"))
>> {   \
>> +virReportError(VIR_ERR_OPERATION_INVALID,
>> "%s",   \
>> +   _("Domain-0 does not support requested
>> operation")); \
>> +goto
>> label;   \
>> +   
>> } \
>> +} while (0)
>> +
>> static libxlDriverPrivatePtr libxl_driver;
>>   @@ -501,6 +510,62 @@ const struct libxl_event_hooks ev_hooks = {
>>   };
>> static int
>> +libxlAddDom0(libxlDriverPrivatePtr driver)
>> +{
>> +libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver);
>> +virDomainDefPtr def = NULL;
>> +virDomainObjPtr vm = NULL;
>> +virDomainDefPtr oldDef = NULL;
>> +libxl_dominfo d_info;
>> +int ret = -1;
>> +
>> +libxl_dominfo_init(&d_info);
>> +
>> +/* Ensure we have a dom0 */
>> +if (libxl_domain_info(cfg->ctx, &d_info, 0) != 0) {
>> +virReportError(VIR_ERR_INTERNAL_ERROR,
>> +   "%s", _("unable to get Domain-0 information
>> from libxenlight"));
>> +goto cleanup;
>> +}
>> +
>> +if (!(def = virDomainDefNew()))
>> +goto cleanup;
>> +
>> +def->id = 0;
>> +def->virtType = VIR_DOMAIN_VIRT_XEN;
>> +if (VIR_STRDUP(def->name, "Domain-0") < 0)
>> +goto cleanup;
>> +
>> +def->os.type = VIR_DOMAIN_OSTYPE_XEN;
>> +
>> +if (virUUIDParse("----",
>> def->uuid) < 0)
>> +goto cleanup;
>> +
>> +vm->def->vcpus = d_info.vcpu_online;
>> +vm->def->maxvcpus = d_info.vcpu_max_id + 1;
>> +vm->def->mem.cur_balloon = d_info.current_memkb;
>> +vm->def->mem.max_balloon = d_info.max_memkb;
> 
> Opps. Before sending the patch, but after testing it again, I moved the
> call to libxl_domain_info to the beginning of this function.  I also
> moved setting the vcpu and memory info earlier, but
> 
>> +
>> +if (!(vm = virDomainObjListAdd(driver->domains, def,
>> +   driver->xmlopt,
>> +   0,
>> +   &oldDef)))
>> +goto cleanup;
>> +
>> +def = NULL;
>> +ret = 0;
> 
> before getting a virDomainObj - ouch.  Consider the following obvious
> fix squashed in
> 
> diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
> index d0b76ac..c0dd00b 100644
> --- a/src/libxl/libxl_driver.c
> +++ b/src/libxl/libxl_driver.c
> @@ -541,18 +541,19 @@ libxlAddDom0(libxlDriverPrivatePtr driver)
>  if (virUUIDParse("----", def->uuid)
> < 0)
>  goto cleanup;
> 
> +if (!(vm = virDomainObjListAdd(driver->domains, def,
> +

Re: [Xen-devel] [RFC PATCH v3 11/18] xen/arm: ITS: Add GITS registers emulation

2015-07-08 Thread Ian Campbell
On Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
> From: Vijaya Kumar K 
> 
> Emulate GITS* registers and handle LPI configuration
> table update trap.

These need to only be exposed to a guest which has been configured with
an ITS. For dom0 that means at a minimum it needs to be based on the
capabilities of the underlying hardware.

The same is true of the next patch adding the GICR registers.

For domU it seems there is currently no ITS exposed to them, since there
is no toolstack changes here, so the emulation should be configured
accordingly.

> 
> Signed-off-by: Vijaya Kumar K 
> ---
>  xen/arch/arm/vgic-v3-its.c|  516 
> +
>  xen/include/asm-arm/gic-its.h |   14 ++
>  2 files changed, 530 insertions(+)
> 
> diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
> index 0671434..fa9dccc 100644
> --- a/xen/arch/arm/vgic-v3-its.c
> +++ b/xen/arch/arm/vgic-v3-its.c
> @@ -63,6 +63,46 @@ static void dump_cmd(its_cmd_block *cmd)
>  }
>  #endif
>  
> +void vgic_its_disable_lpis(struct vcpu *v, uint32_t vlpi)
> +{
> +struct pending_irq *p;
> +unsigned long flags;
> +
> +p = irq_to_pending(v, vlpi);
> +clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> +gic_remove_from_queues(v, vlpi);
> +if ( p->desc != NULL )
> +{
> +spin_lock_irqsave(&p->desc->lock, flags);
> +p->desc->handler->disable(p->desc);
> +spin_unlock_irqrestore(&p->desc->lock, flags);
> +}
> +}
> +
> +void vgic_its_enable_lpis(struct vcpu *v, uint32_t vlpi, uint8_t priority)
> +{
> +struct pending_irq *p;
> +unsigned long flags;
> +
> +/* Get plpi for the given vlpi */
> +p = irq_to_pending(v, vlpi);
> +p->priority = priority;
> +set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> +
> +spin_lock_irqsave(&v->arch.vgic.lock, flags);
> +
> +if ( !list_empty(&p->inflight) &&
> + !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
> +gic_raise_guest_irq(v, irq_to_virq(p->desc), p->priority);
> +
> +spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> +if ( p->desc != NULL )
> +{
> +spin_lock_irqsave(&p->desc->lock, flags);
> +p->desc->handler->enable(p->desc);
> +spin_unlock_irqrestore(&p->desc->lock, flags);
> +}
> +}
>  /* ITS device table helper functions */
>  int vits_vdevice_entry(struct domain *d, uint32_t dev_id,
> struct vdevice_table *entry, int set)
> @@ -649,6 +689,482 @@ err:
>  return 0;
>  }
>  
> +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info)
> +{
> +uint32_t offset;
> +struct hsr_dabt dabt = info->dabt;
> +struct cpu_user_regs *regs = guest_cpu_user_regs();
> +register_t *r = select_user_reg(regs, dabt.reg);
> +uint8_t cfg;
> +
> +offset = info->gpa -
> + (v->domain->arch.vits->propbase & 0xf000UL);
> +
> +if ( offset < SZ_64K )
> +{
> +DPRINTK("vITS:d%dv%d LPI Table read offset 0x%x\n",
> +v->domain->domain_id, v->vcpu_id, offset);
> +cfg = readb_relaxed(v->domain->arch.vits->prop_page + offset);
> +*r = cfg;
> +return 1;
> +}
> +else
> +dprintk(XENLOG_G_ERR, "vITS:d%dv%d LPI Table read with wrong offset 
> 0x%x\n",
> +v->domain->domain_id, v->vcpu_id, offset);
> +
> +
> +return 0;
> +}
> +
> +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info)
> +{
> +uint32_t offset;
> +uint32_t vid;
> +uint8_t cfg;
> +bool_t enable;
> +struct hsr_dabt dabt = info->dabt;
> +struct cpu_user_regs *regs = guest_cpu_user_regs();
> +register_t *r = select_user_reg(regs, dabt.reg);
> +
> +offset = info->gpa -
> + (v->domain->arch.vits->propbase & 0xf000UL);
> +
> +vid = offset + NR_GIC_LPI;
> +if ( offset < SZ_64K )
> +{
> +DPRINTK("vITS:d%dv%d LPI Table write offset 0x%x\n",
> +v->domain->domain_id, v->vcpu_id, offset);
> +cfg = readb_relaxed(v->domain->arch.vits->prop_page + offset);
> +enable = (cfg & *r) & 0x1;
> +
> +if ( !enable )
> + vgic_its_enable_lpis(v, vid,  (*r & 0xfc));
> +else
> + vgic_its_disable_lpis(v, vid);
> +
> +/* Update virtual prop page */
> +writeb_relaxed((*r & 0xff),
> +v->domain->arch.vits->prop_page + offset);
> +
> +return 1;
> +}
> +else
> +dprintk(XENLOG_G_ERR, "vITS:d%dv%d LPI Table invalid write @ 0x%x\n",
> +v->domain->domain_id, v->vcpu_id, offset);
> +
> +return 0; 
> +}
> +
> +static const struct mmio_handler_ops vgic_gits_lpi_mmio_handler = {
> +.read_handler  = vgic_v3_gits_lpi_mmio_read,
> +.write_handler = vgic_v3_gits_lpi_mmio_write,
> +};
> +
> +int vgic_its_unmap_lpi_prop(struct vcpu *v)
> +{
> +paddr_t maddr;
> +uint32_t lpi_size;
> +int i;
> +
>

Re: [Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-08 Thread Lengyel, Tamas
On Wed, Jul 8, 2015 at 6:22 AM, Razvan Cojocaru 
wrote:

> Deny register writes if a vm_client subscribed to mov_to_msr or
> control register write events forbids them. Currently supported for
> MSR, CR0, CR3 and CR4 events.
>
> Signed-off-by: Razvan Cojocaru 
> Acked-by: George Dunlap 
> Acked-by: Jan Beulich 
>
> ---
> Changes since V3:
>  - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed
>the bit shift appropriately).
>  - Moved the DENY vm_event response logic from p2m.c to newly
>added dedicated files for vm_event handling, as suggested
>by Tamas Lengyel.
>

This looks good to me. It will have to be rebased on staging once the other
series is merged as couple things will conflict. If this series lands first
however, the newly added asm/vm_event files lack the required license
header.

With that:
Acked-by: Tamas K Lengyel 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v25 06/15] x86/VPMU: Initialize PMU for PV(H) guests

2015-07-08 Thread Dietmar Hahn
Am Freitag 19 Juni 2015, 14:44:37 schrieb Boris Ostrovsky:
> Code for initializing/tearing down PMU for PV guests
> 
> Signed-off-by: Boris Ostrovsky 
> Acked-by: Daniel De Graaf 
> Acked-by: Jan Beulich 
> Acked-by: Kevin Tian 

Reviewed-by: Dietmar Hahn 

> ---
>  tools/flask/policy/policy/modules/xen/xen.te |   4 +
>  xen/arch/x86/domain.c|   2 +
>  xen/arch/x86/hvm/hvm.c   |   1 +
>  xen/arch/x86/hvm/svm/svm.c   |   4 +-
>  xen/arch/x86/hvm/svm/vpmu.c  |  16 +++-
>  xen/arch/x86/hvm/vmx/vmx.c   |   4 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c|  30 --
>  xen/arch/x86/hvm/vpmu.c  | 131 
> ---
>  xen/common/event_channel.c   |   1 +
>  xen/include/asm-x86/hvm/vpmu.h   |   2 +
>  xen/include/public/pmu.h |   2 +
>  xen/include/public/xen.h |   1 +
>  xen/include/xsm/dummy.h  |   3 +
>  xen/xsm/flask/hooks.c|   4 +
>  xen/xsm/flask/policy/access_vectors  |   2 +
>  15 files changed, 181 insertions(+), 26 deletions(-)
> 
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
> b/tools/flask/policy/policy/modules/xen/xen.te
> index 45b5cb2..f553eb5 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -130,6 +130,10 @@ if (guest_writeconsole) {
>   dontaudit domain_type xen_t : xen writeconsole;
>  }
>  
> +# Allow all domains to use PMU (but not to change its settings --- that's 
> what
> +# pmu_ctrl is for)
> +allow domain_type xen_t:xen2 pmu_use;
> +
>  
> ###
>  #
>  # Domain creation
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index dc18565..b699f68 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -438,6 +438,8 @@ int vcpu_initialise(struct vcpu *v)
>  vmce_init_vcpu(v);
>  }
>  
> +spin_lock_init(&v->arch.vpmu.vpmu_lock);
> +
>  if ( has_hvm_container_domain(d) )
>  {
>  rc = hvm_vcpu_initialise(v);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d5e5242..83a81f5 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4931,6 +4931,7 @@ static hvm_hypercall_t *const 
> pvh_hypercall64_table[NR_hypercalls] = {
>  HYPERCALL(hvm_op),
>  HYPERCALL(sysctl),
>  HYPERCALL(domctl),
> +HYPERCALL(xenpmu_op),
>  [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation
>  };
>  
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index a02f983..680eebe 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1165,7 +1165,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>  return rc;
>  }
>  
> -vpmu_initialise(v);
> +/* PVH's VPMU is initialized via hypercall */
> +if ( is_hvm_vcpu(v) )
> +vpmu_initialise(v);
>  
>  svm_guest_osvw_init(v);
>  
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index b60ca40..a8572a6 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -364,13 +364,11 @@ static void amd_vpmu_destroy(struct vcpu *v)
>  amd_vpmu_unset_msr_bitmap(v);
>  
>  xfree(vpmu->context);
> -vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
>  
>  if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
> -{
> -vpmu_reset(vpmu, VPMU_RUNNING);
>  release_pmu_ownship(PMU_OWNER_HVM);
> -}
> +
> +vpmu_clear(vpmu);
>  }
>  
>  /* VPMU part of the 'q' keyhandler */
> @@ -482,6 +480,16 @@ int __init amd_vpmu_init(void)
>  return -EINVAL;
>  }
>  
> +if ( sizeof(struct xen_pmu_data) +
> + 2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
> +{
> +printk(XENLOG_WARNING
> +   "VPMU: Register bank does not fit into VPMU shared page\n");
> +counters = ctrls = NULL;
> +num_counters = 0;
> +return -ENOSPC;
> +}
> +
>  return 0;
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 0837627..50e11dd 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -140,7 +140,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  }
>  }
>  
> -vpmu_initialise(v);
> +/* PVH's VPMU is initialized via hypercall */
> +if ( is_hvm_vcpu(v) )
> +vpmu_initialise(v);
>  
>  vmx_install_vlapic_mapping(v);
>  
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 025c970..e7642e5 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -365,13 +365,16 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>  if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
>  return 0;
>

  1   2   3   >