Re: [Xen-devel] Missing "\n" in dump_softtsc() from xen/arch/x86/time.c

2016-12-12 Thread Jan Beulich
>>> On 10.12.16 at 17:43,  wrote:
> When doing "xl debug-key s; xl dmesg | tail" as instructed for checking
> TSC emulation mode on x86, I always see this line unterminated:
> 
> (XEN) dom2(hvm): mode=0,ofs=0x204b1518d4d,khz=2672736,inc=1
> 
> Apparently, the cause is simple: in xen/arch/x86/time.c there is
> 
> static void dump_softtsc(unsigned char key)
> {
> // ... (skipped) ...
> for_each_domain ( d )
> {
> if ( is_hardware_domain(d) && d->arch.tsc_mode == TSC_MODE_DEFAULT )
> continue;
> printk("dom%u%s: mode=%d",d->domain_id,
> is_hvm_domain(d) ? "(hvm)" : "", d->arch.tsc_mode);
> if ( d->arch.vtsc_offset )
> printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
> if ( d->arch.tsc_khz )
> printk(",khz=%"PRIu32, d->arch.tsc_khz);
> if ( d->arch.incarnation )
> printk(",inc=%"PRIu32, d->arch.incarnation);
> #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
> if ( !(d->arch.vtsc_kerncount | d->arch.vtsc_usercount) )
> printk("\n");
> else
> printk(",vtsc count: %"PRIu64" kernel, %"PRIu64" user\n",
>d->arch.vtsc_kerncount, d->arch.vtsc_usercount);
> #endif
> domcnt++;
> }
> // ... (skipped) ...
> }
> 
> That is, "\n" is only output in debug builds. Please, add an unconditional
> printk("\n") in #else clause for regular builds. Following is my formal
> patch proposal, but it is so trivial that you better reimplement that
> yourself as your rules mandate, with all that "Signed-off-by" and other
> cool git-related and legal-related stuff. Thanks in advance.

Please submit patches as standalone mails, i.e. without initial
problem description (other than what goes in the commit message).
Also please include [PATCH] in the subject.

> From dac74660f526436ecc6a2fa0101ec81f6822eb66 Mon Sep 17 00:00:00 2001
> From: Anton Samsonov 
> Date: Sat, 10 Dec 2016 18:51:38 +0300
> Subject: [PATCH] x86/time: Fix missing "\n" in non-debug log.
> 
> On x86, when dumping the emulated TSC information to dmesg, the line
> 
> dom%u%s: mode=%d,ofs=0#ul,khz=%u,inc=%u
> 
> is formed by several conditional statements, possibly followed by other
> printk() calls which ultimately end the line with "\n", but only in debug
> builds or with performance counters enabled, - in regular builds, however,
> the line stays unfinished and continued by further output.
> 
> This patch adds such final printk("\n") for regular builds, too.
> ---

There's no Signed-off-by here, without which we won't be able to
take the patch. However, ...

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2186,6 +2186,8 @@ static void dump_softtsc(unsigned char key)
>  else
>  printk(",vtsc count: %"PRIu64" kernel, %"PRIu64" user\n",
> d->arch.vtsc_kerncount, d->arch.vtsc_usercount);
> +#else
> +printk("\n");
>  #endif
>  domcnt++;
>  }

I think this wants doing differently - I'll submit a replacement patch
in a minute, with you as the reporter of the issue.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] X86/VPMU: mask off uncore overflow bit on xeon phi knights landing

2016-12-12 Thread Kang, Luwei
> >>> On 12.12.16 at 07:51,  wrote:
> > By the way, I think another place may need to do some modify as well.
> >
> > @@ -868,7 +868,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
> > *regs)
> >  if ( is_pmc_quirk )
> >  handle_pmc_quirk(msr_content);
> >  core2_vpmu_cxt->global_status |= msr_content;
> > -msr_content = ~global_ovf_ctrl_mask;
> > +msr_content &= ~global_ovf_ctrl_mask;
> >  wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
> >  }
> >
> > If one counter have overflow all the bit will be clean. I think it
> > need add & with current status.
> >
> > Hi jan and Andrew,
> > What is your opinion?
> 
> I agree, but it looks to be an independent change.
> 
Hi Jan,
Thanks for your reply. I will submit another patch regarding " msr_content 
&= ~global_ovf_ctrl_mask;" soon.
What is your opinion about this patch? Need any modify?

Thanks,
Luwei Kang 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/time: don't omit newline in dump_softtsc()

2016-12-12 Thread Jan Beulich
Reported-by: Anton Samsonov 
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2145,12 +2145,11 @@ static void dump_softtsc(unsigned char k
 if ( d->arch.incarnation )
 printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-if ( !(d->arch.vtsc_kerncount | d->arch.vtsc_usercount) )
-printk("\n");
-else
-printk(",vtsc count: %"PRIu64" kernel, %"PRIu64" user\n",
+if ( d->arch.vtsc_kerncount | d->arch.vtsc_usercount )
+printk(",vtsc count: %"PRIu64" kernel,%"PRIu64" user",
d->arch.vtsc_kerncount, d->arch.vtsc_usercount);
 #endif
+printk("\n");
 domcnt++;
 }
 



x86/time: don't omit newline in dump_softtsc()

Reported-by: Anton Samsonov 
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2145,12 +2145,11 @@ static void dump_softtsc(unsigned char k
 if ( d->arch.incarnation )
 printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-if ( !(d->arch.vtsc_kerncount | d->arch.vtsc_usercount) )
-printk("\n");
-else
-printk(",vtsc count: %"PRIu64" kernel, %"PRIu64" user\n",
+if ( d->arch.vtsc_kerncount | d->arch.vtsc_usercount )
+printk(",vtsc count: %"PRIu64" kernel,%"PRIu64" user",
d->arch.vtsc_kerncount, d->arch.vtsc_usercount);
 #endif
+printk("\n");
 domcnt++;
 }
 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] X86/VPMU: mask off uncore overflow bit on xeon phi knights landing

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 09:06,  wrote:
>> >>> On 12.12.16 at 07:51,  wrote:
>> > By the way, I think another place may need to do some modify as well.
>> >
>> > @@ -868,7 +868,7 @@ static int core2_vpmu_do_interrupt(struct 
>> > cpu_user_regs *regs)
>> >  if ( is_pmc_quirk )
>> >  handle_pmc_quirk(msr_content);
>> >  core2_vpmu_cxt->global_status |= msr_content;
>> > -msr_content = ~global_ovf_ctrl_mask;
>> > +msr_content &= ~global_ovf_ctrl_mask;
>> >  wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
>> >  }
>> >
>> > If one counter have overflow all the bit will be clean. I think it
>> > need add & with current status.
>> >
>> > Hi jan and Andrew,
>> > What is your opinion?
>> 
>> I agree, but it looks to be an independent change.
>> 
> Thanks for your reply. I will submit another patch regarding " 
> msr_content &= ~global_ovf_ctrl_mask;" soon.
> What is your opinion about this patch? Need any modify?

"This" being which one? The one visible above, or the one at the
root of this thread? For the one above I've already said "I agree",
so I'm not sure what else you ask for. For the one at the root of
this thread, I think Boris has pointed out what changes would be
wanted.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Fix errornous negation for isstubdom, which breaks HVM pci-passthrough.

2016-12-12 Thread Cedric Bosdonnat
Nice catch!
ACK from me, thought I don't know if it really counts ;)

--
Cedric

On Sat, 2016-12-10 at 18:59 +0100, Sander Eikelenboom wrote:
> Commit 20b75251d9721d9c050a973c02baac396c794ade introduced an errornous
> negation which gave the isstubdom bool the opposite semantics,
> causing the subsequent code to take the wrong code path.
> 
> Signed-off-by: Sander Eikelenboom 
> ---
>  tools/libxl/libxl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 3b707f3..8395352 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1001,7 +1001,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
> libxl_device_pci *pcidev, i
>  int irq, i, rc, hvm = 0;
>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>  uint32_t domainid = domid;
> -bool isstubdom = !libxl_is_stubdom(ctx, domid, &domainid);
> +bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
>  
>  if (type == LIBXL_DOMAIN_TYPE_INVALID)
>  return ERROR_FAIL;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/setup: Don't relocate p2m/initrd over existing one

2016-12-12 Thread Juergen Gross
On 09/12/16 16:50, Ross Lagerwall wrote:
> When relocating the p2m/initrd, take special care not to relocate it so
> that is overlaps with the current location of the p2m/initrd. This is
> needed since the full extent of the current location is not marked as a
> reserved region in the e820 (and it shouldn't be since it is about to be
> moved).
> 
> This was seen to happen to a dom0 with a large initial p2m and a small
> reserved region in the middle of the initial p2m.
> 
> Signed-off-by: Ross Lagerwall 

Wouldn't it be more natural to memblock_reserve() the hypervisor
supplied p2m list even in case of an E820 conflict and do a
memblock_free() of that region after moving the p2m list in
xen_relocate_p2m() ?

This would avoid the need to supply a range to avoid for memory
allocation when calling xen_find_free_area(), which is pointless in the
initrd case (the original initrd area is already reserved via
memblock_reserve() ).


Juergen

> ---
>  arch/x86/xen/mmu.c |  4 ++--
>  arch/x86/xen/setup.c   | 16 ++--
>  arch/x86/xen/xen-ops.h |  5 +++--
>  3 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 7d5afdb..bc40325 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2074,7 +2074,7 @@ static phys_addr_t __init 
> xen_early_virt_to_phys(unsigned long vaddr)
>   * Find a new area for the hypervisor supplied p2m list and relocate the p2m 
> to
>   * this area.
>   */
> -void __init xen_relocate_p2m(void)
> +void __init xen_relocate_p2m(phys_addr_t cur_start, phys_addr_t cur_size)
>  {
>   phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys;
>   unsigned long p2m_pfn, p2m_pfn_end, n_frames, pfn, pfn_end;
> @@ -2092,7 +2092,7 @@ void __init xen_relocate_p2m(void)
>   n_pud = roundup(size, PGDIR_SIZE) >> PGDIR_SHIFT;
>   n_frames = n_pte + n_pt + n_pmd + n_pud;
>  
> - new_area = xen_find_free_area(PFN_PHYS(n_frames));
> + new_area = xen_find_free_area(PFN_PHYS(n_frames), cur_start, cur_size);
>   if (!new_area) {
>   xen_raw_console_write("Can't find new memory area for p2m 
> needed due to E820 map conflict\n");
>   BUG();
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index f8960fc..513c48b 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -634,14 +634,15 @@ bool __init xen_is_e820_reserved(phys_addr_t start, 
> phys_addr_t size)
>  }
>  
>  /*
> - * Find a free area in physical memory not yet reserved and compliant with
> - * E820 map.
> + * Find a free area in physical memory not yet reserved, compliant with the
> + * E820 map and not overlapping with the pre-allocated area.
>   * Used to relocate pre-allocated areas like initrd or p2m list which are in
>   * conflict with the to be used E820 map.
>   * In case no area is found, return 0. Otherwise return the physical address
>   * of the area which is already reserved for convenience.
>   */
> -phys_addr_t __init xen_find_free_area(phys_addr_t size)
> +phys_addr_t __init xen_find_free_area(phys_addr_t size, phys_addr_t 
> cur_start,
> +   phys_addr_t cur_size)
>  {
>   unsigned mapcnt;
>   phys_addr_t addr, start;
> @@ -652,7 +653,8 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
>   continue;
>   start = entry->addr;
>   for (addr = start; addr < start + size; addr += PAGE_SIZE) {
> - if (!memblock_is_reserved(addr))
> + if (!memblock_is_reserved(addr) &&
> + (addr < cur_start || addr >= cur_start + cur_size))
>   continue;
>   start = addr + PAGE_SIZE;
>   if (start + size > entry->addr + entry->size)
> @@ -726,7 +728,7 @@ static void __init xen_reserve_xen_mfnlist(void)
>   xen_raw_console_write("Xen hypervisor allocated p2m list conflicts with 
> E820 map\n");
>   BUG();
>  #else
> - xen_relocate_p2m();
> + xen_relocate_p2m(start, size);
>  #endif
>  }
>  
> @@ -887,7 +889,9 @@ char * __init xen_memory_setup(void)
>boot_params.hdr.ramdisk_size)) {
>   phys_addr_t new_area, start, size;
>  
> - new_area = xen_find_free_area(boot_params.hdr.ramdisk_size);
> + new_area = xen_find_free_area(boot_params.hdr.ramdisk_size,
> +   boot_params.hdr.ramdisk_image,
> +   boot_params.hdr.ramdisk_size);
>   if (!new_area) {
>   xen_raw_console_write("Can't find new memory area for 
> initrd needed due to E820 map conflict\n");
>   BUG();
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 3cbce3b..d3342b8 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,14 +41,15 @@ void __init xen_pt_check_e820(void);
> 

Re: [Xen-devel] [PATCH v2] fix potential int overflow in efi/boot

2016-12-12 Thread Jan Beulich
>>> On 09.12.16 at 20:52,  wrote:
> HorizontalResolution and VerticalResolution are 32bit, while size is
> 64bit. As it stands multiplications are evaluated with 32bit arithmetic,
> which could overflow. Cast HorizontalResolution to 64bit to avoid that.
> 
> Coverity-ID: 1381858
> 
> Signed-off-by: Stefano Stabellini 

Acked-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 03:27,  wrote:
> On 12/09/16 16:35 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:20AM +0800, Haozhong Zhang wrote:
>>> @@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long 
>>> epfn, unsigned int pxm)
>>>  info.epfn = epfn;
>>>  info.cur = spfn;
>>>
>>> -ret = extend_frame_table(&info);
>>> +ret = extend_frame_table(&info, &info);
>>
>>is equivalant to 'info' so I am not sure I understand the purpose
>>behind this patch?
>>
> 
> Yes, they are identical for the ordinary RAM here, and the frame table
> is allocated at the begin of the hot-added RAM. For NVDIMM, the
> hypervisor does not know which part is used for data, so the second
> parameter 'alloc_info' is used to indicate which part can be used for
> the frame table, and might be different from 'info'.

In which case you want to add a comment ahead of that function
clarifying the difference (an perhaps pointing out that the two
may be identical).

Also the commit message should be adjusted a little - this is only
preparation for pmem support, so the wording should reflect that.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 03:31,  wrote:
> On 12/09/16 16:38 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
>>> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long 
>>> epfn, unsigned int pxm)
>>>  total_pages += epfn - spfn;
>>>
>>>  set_pdx_range(spfn, epfn);
>>> -ret = setup_m2p_table(&info);
>>> +ret = setup_m2p_table(&info, &info);
>>
>>I am not sure I follow this logic. You are passing the same contents, it
>>is just that 'alloc_info' and 'info' are aliased together?
>>
> 
> Similarly to patch 1, the two parameters of setup_m2p_table() are
> identical for the ordinary RAM, and can be different for NVDIMM.

And the same comments as for patch 1 apply here then.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/setup: Don't relocate p2m/initrd over existing one

2016-12-12 Thread Ross Lagerwall

On 12/12/2016 08:19 AM, Juergen Gross wrote:

On 09/12/16 16:50, Ross Lagerwall wrote:

When relocating the p2m/initrd, take special care not to relocate it so
that is overlaps with the current location of the p2m/initrd. This is
needed since the full extent of the current location is not marked as a
reserved region in the e820 (and it shouldn't be since it is about to be
moved).

This was seen to happen to a dom0 with a large initial p2m and a small
reserved region in the middle of the initial p2m.

Signed-off-by: Ross Lagerwall 


Wouldn't it be more natural to memblock_reserve() the hypervisor
supplied p2m list even in case of an E820 conflict and do a
memblock_free() of that region after moving the p2m list in
xen_relocate_p2m() ?

This would avoid the need to supply a range to avoid for memory
allocation when calling xen_find_free_area(), which is pointless in the
initrd case (the original initrd area is already reserved via
memblock_reserve() ).



I'm not familiar with the code, but I was concerned that if part of the 
hypervisor-supplied p2m was already reserved by memblock_reserve(), then 
the above approach would remove it when memblock_free is called. If this 
is not a valid concern, then I can send a patch to do it.


--
Ross Lagerwall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 05:16,  wrote:
> On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:
>>On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
>>> +static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>>> +  unsigned long rsv_spfn, unsigned long rsv_epfn,
>>> +  unsigned long data_spfn, unsigned long data_epfn)
>>> +{
>>> +if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
>>> +return 0;
>>
>>Hm, I think it ought to be possible to have no rsv area..?
> 
> A reserved area must be provided to storing frametable and M2P table of 
> NVDIMM.

Is this really "must" rather than just "should", i.e. can't you do
without even if you prefer to have it?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table

2016-12-12 Thread Haozhong Zhang

On 12/12/16 01:26 -0700, Jan Beulich wrote:

On 12.12.16 at 03:31,  wrote:

On 12/09/16 16:38 -0500, Konrad Rzeszutek Wilk wrote:

On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:

@@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 total_pages += epfn - spfn;

 set_pdx_range(spfn, epfn);
-ret = setup_m2p_table(&info);
+ret = setup_m2p_table(&info, &info);


I am not sure I follow this logic. You are passing the same contents, it
is just that 'alloc_info' and 'info' are aliased together?



Similarly to patch 1, the two parameters of setup_m2p_table() are
identical for the ordinary RAM, and can be different for NVDIMM.


And the same comments as for patch 1 apply here then.

Jan



I'll add comments and clarify in commit messages for both patch 1 and
patch 2 in the next version.

Thanks,
Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/setup: Don't relocate p2m/initrd over existing one

2016-12-12 Thread Juergen Gross
On 12/12/16 09:28, Ross Lagerwall wrote:
> On 12/12/2016 08:19 AM, Juergen Gross wrote:
>> On 09/12/16 16:50, Ross Lagerwall wrote:
>>> When relocating the p2m/initrd, take special care not to relocate it so
>>> that is overlaps with the current location of the p2m/initrd. This is
>>> needed since the full extent of the current location is not marked as a
>>> reserved region in the e820 (and it shouldn't be since it is about to be
>>> moved).
>>>
>>> This was seen to happen to a dom0 with a large initial p2m and a small
>>> reserved region in the middle of the initial p2m.
>>>
>>> Signed-off-by: Ross Lagerwall 
>>
>> Wouldn't it be more natural to memblock_reserve() the hypervisor
>> supplied p2m list even in case of an E820 conflict and do a
>> memblock_free() of that region after moving the p2m list in
>> xen_relocate_p2m() ?
>>
>> This would avoid the need to supply a range to avoid for memory
>> allocation when calling xen_find_free_area(), which is pointless in the
>> initrd case (the original initrd area is already reserved via
>> memblock_reserve() ).
>>
> 
> I'm not familiar with the code, but I was concerned that if part of the
> hypervisor-supplied p2m was already reserved by memblock_reserve(), then
> the above approach would remove it when memblock_free is called. If this
> is not a valid concern, then I can send a patch to do it.

If this would be the case we'd be in trouble: after moving the p2m list
to another location nobody should reference the old p2m list as it might
be located at a position to be remapped due to E820 memory map
constraints.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/hvm: don't create a default ioreq server...

2016-12-12 Thread Paul Durrant
> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 12 December 2016 07:54
> To: Paul Durrant 
> Cc: Andrew Cooper ; xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH] x86/hvm: don't create a default ioreq server...
> 
> >>> On 09.12.16 at 18:55,  wrote:
> > ...if the domain is not under construction.
> 
> I think the title will end up misleading this way. How about "x86/hvm:
> don't unconditionally create a default ioreq server" with "Avoid doing
> so if the domain is not under construction" as the 1st sentence?
> 

Yes, that would avoid the shortlog being misleading.

> As the patch is otherwise ready to go in, that adjustment could
> certainly be done while committing (i.e. no re-send necessary).
> 

Thanks. I also realised there is typo in the code comment:

" the query should have that side-effect" should be " the query should *not* 
have that side-effect"

Please can you fix this too.

Cheers,

  Paul

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions

2016-12-12 Thread Haozhong Zhang

On 12/12/16 01:30 -0700, Jan Beulich wrote:

On 12.12.16 at 05:16,  wrote:

On 12/09/16 17:02 -0500, Konrad Rzeszutek Wilk wrote:

On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:

+static int pmem_add_check(unsigned long spfn, unsigned long epfn,
+  unsigned long rsv_spfn, unsigned long rsv_epfn,
+  unsigned long data_spfn, unsigned long data_epfn)
+{
+if ( spfn >= epfn || rsv_spfn >= rsv_epfn || data_spfn >= data_epfn )
+return 0;


Hm, I think it ought to be possible to have no rsv area..?


A reserved area must be provided to storing frametable and M2P table of
NVDIMM.


Is this really "must" rather than just "should", i.e. can't you do
without even if you prefer to have it?

Jan



It's a must in this version, but I relax this requirement in the WIP
v2 patch that fallback to the ordinary RAM if no reserved pmem area is
indicated.

Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Fix errornous negation for isstubdom, which breaks HVM pci-passthrough.

2016-12-12 Thread Wei Liu
On Sat, Dec 10, 2016 at 06:59:08PM +0100, Sander Eikelenboom wrote:
> Commit 20b75251d9721d9c050a973c02baac396c794ade introduced an errornous
> negation which gave the isstubdom bool the opposite semantics,
> causing the subsequent code to take the wrong code path.
> 
> Signed-off-by: Sander Eikelenboom 

Acked-by: Wei Liu 

Though I would like to shorten the title a bit. That is, I will move the
clause "which ..." to the end of the commit log.

  Commit 20b75251d9721d9c050a973c02baac396c794ade introduced an
  erroneous negation which gave the isstubdom bool the opposite semantics,
  causing the subsequent code to take the wrong code path, which breaks
  ...

> ---
>  tools/libxl/libxl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 3b707f3..8395352 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1001,7 +1001,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
> libxl_device_pci *pcidev, i
>  int irq, i, rc, hvm = 0;
>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>  uint32_t domainid = domid;
> -bool isstubdom = !libxl_is_stubdom(ctx, domid, &domainid);
> +bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
>  
>  if (type == LIBXL_DOMAIN_TYPE_INVALID)
>  return ERROR_FAIL;
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Fix errornous negation for isstubdom, which breaks HVM pci-passthrough.

2016-12-12 Thread Wei Liu
On Mon, Dec 12, 2016 at 09:17:42AM +0100, Cedric Bosdonnat wrote:
> Nice catch!
> ACK from me, thought I don't know if it really counts ;)
> 

It does.

I will add your acked-by while committing.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86emul: support LAR/LSL/VERR/VERW

2016-12-12 Thread Jan Beulich
This involves protmode_load_seg() accepting x86_seg_none as input, with
the meaning to
- suppress any exceptions other than #PF,
- not commit any state.

Signed-off-by: Jan Beulich 
---
v2: Extend commit message and add a respective code comment. Add
ASSERT()s to ensure/document that only #PF can make it out of
protmode_load_seg().

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -46,7 +46,47 @@ static int read(
 if ( verbose )
 printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes);
 
-bytes_read += bytes;
+switch ( seg )
+{
+uint64_t value;
+
+case x86_seg_gdtr:
+/* Fake system segment type matching table index. */
+if ( (offset & 7) || (bytes > 8) )
+return X86EMUL_UNHANDLEABLE;
+#ifdef __x86_64__
+if ( !(offset & 8) )
+{
+memset(p_data, 0, bytes);
+return X86EMUL_OKAY;
+}
+value = (offset - 8) >> 4;
+#else
+value = (offset - 8) >> 3;
+#endif
+if ( value >= 0x10 )
+return X86EMUL_UNHANDLEABLE;
+value |= value << 40;
+memcpy(p_data, &value, bytes);
+return X86EMUL_OKAY;
+
+case x86_seg_ldtr:
+/* Fake user segment type matching table index. */
+if ( (offset & 7) || (bytes > 8) )
+return X86EMUL_UNHANDLEABLE;
+value = offset >> 3;
+if ( value >= 0x10 )
+return X86EMUL_UNHANDLEABLE;
+value |= (value | 0x10) << 40;
+memcpy(p_data, &value, bytes);
+return X86EMUL_OKAY;
+
+default:
+if ( !is_x86_user_segment(seg) )
+return X86EMUL_UNHANDLEABLE;
+bytes_read += bytes;
+break;
+}
 memcpy(p_data, (void *)offset, bytes);
 return X86EMUL_OKAY;
 }
@@ -75,6 +115,8 @@ static int write(
 if ( verbose )
 printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes);
 
+if ( !is_x86_user_segment(seg) )
+return X86EMUL_UNHANDLEABLE;
 memcpy((void *)offset, p_data, bytes);
 return X86EMUL_OKAY;
 }
@@ -90,10 +132,24 @@ static int cmpxchg(
 if ( verbose )
 printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes);
 
+if ( !is_x86_user_segment(seg) )
+return X86EMUL_UNHANDLEABLE;
 memcpy((void *)offset, new, bytes);
 return X86EMUL_OKAY;
 }
 
+static int read_segment(
+enum x86_segment seg,
+struct segment_register *reg,
+struct x86_emulate_ctxt *ctxt)
+{
+if ( !is_x86_user_segment(seg) )
+return X86EMUL_UNHANDLEABLE;
+memset(reg, 0, sizeof(*reg));
+reg->attr.fields.p = 1;
+return X86EMUL_OKAY;
+}
+
 static int cpuid(
 unsigned int *eax,
 unsigned int *ebx,
@@ -193,6 +249,21 @@ static int read_cr(
 return X86EMUL_UNHANDLEABLE;
 }
 
+static int read_msr(
+unsigned int reg,
+uint64_t *val,
+struct x86_emulate_ctxt *ctxt)
+{
+switch ( reg )
+{
+case 0xc080: /* EFER */
+*val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
+return X86EMUL_OKAY;
+}
+
+return X86EMUL_UNHANDLEABLE;
+}
+
 int get_fpu(
 void (*exception_callback)(void *, struct cpu_user_regs *),
 void *exception_callback_arg,
@@ -223,8 +294,10 @@ static struct x86_emulate_ops emulops =
 .insn_fetch = fetch,
 .write  = write,
 .cmpxchg= cmpxchg,
+.read_segment = read_segment,
 .cpuid  = cpuid,
 .read_cr= read_cr,
+.read_msr   = read_msr,
 .get_fpu= get_fpu,
 };
 
@@ -724,6 +797,156 @@ int main(int argc, char **argv)
 goto fail;
 printf("okay\n");
 
+printf("%-40s", "Testing lar (null selector)...");
+instr[0] = 0x0f; instr[1] = 0x02; instr[2] = 0xc1;
+regs.eflags = 0x240;
+regs.eip= (unsigned long)&instr[0];
+regs.ecx= 0;
+regs.eax= 0x;
+rc = x86_emulate(&ctxt, &emulops);
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eax != 0x) ||
+ (regs.eflags != 0x200) ||
+ (regs.eip != (unsigned long)&instr[3]) )
+goto fail;
+printf("okay\n");
+
+printf("%-40s", "Testing lsl (null selector)...");
+instr[0] = 0x0f; instr[1] = 0x03; instr[2] = 0xca;
+regs.eflags = 0x240;
+regs.eip= (unsigned long)&instr[0];
+regs.edx= 0;
+regs.ecx= 0x;
+rc = x86_emulate(&ctxt, &emulops);
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.ecx != 0x) ||
+ (regs.eflags != 0x200) ||
+ (regs.eip != (unsigned long)&instr[3]) )
+goto fail;
+printf("okay\n");
+
+printf("%-40s", "Testing verr (null selector)...");
+instr[0] = 0x0f; instr[1] = 0x00; instr[2] = 0x21;
+regs.eflags = 0x240;
+regs.eip= (unsigned long)&instr[0];
+regs.ecx= (unsigned long)res;
+*res= 0;
+rc = x86_emulate(&ctxt, &emulops);
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eflags != 0x200) ||

[Xen-devel] [PATCH v3 3/7] x86emul/test: factor out emul_test_{read_cr, cpuid}

2016-12-12 Thread Wei Liu
While at it, move xgetbv, all cpu_has_* and cache_line_size macros to
x86_emulate.h.

Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/tests/x86_emulator/test_x86_emulator.c | 103 +--
 tools/tests/x86_emulator/x86_emulate.c   |  39 ++
 tools/tests/x86_emulator/x86_emulate.h   |  72 +++
 3 files changed, 113 insertions(+), 101 deletions(-)

diff --git a/tools/tests/x86_emulator/test_x86_emulator.c 
b/tools/tests/x86_emulator/test_x86_emulator.c
index 0d80bff..e40f0ea 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -92,105 +92,6 @@ static int cmpxchg(
 return X86EMUL_OKAY;
 }
 
-static int cpuid(
-unsigned int *eax,
-unsigned int *ebx,
-unsigned int *ecx,
-unsigned int *edx,
-struct x86_emulate_ctxt *ctxt)
-{
-unsigned int leaf = *eax;
-
-asm ("cpuid" : "+a" (*eax), "+c" (*ecx), "=d" (*edx), "=b" (*ebx));
-
-/* The emulator doesn't itself use MOVBE, so we can always run the test. */
-if ( leaf == 1 )
-*ecx |= 1U << 22;
-
-return X86EMUL_OKAY;
-}
-
-#define cache_line_size() ({ \
-unsigned int eax = 1, ebx, ecx = 0, edx; \
-cpuid(&eax, &ebx, &ecx, &edx, NULL); \
-edx & (1U << 19) ? (ebx >> 5) & 0x7f8 : 0; \
-})
-
-#define cpu_has_mmx ({ \
-unsigned int eax = 1, ecx = 0, edx; \
-cpuid(&eax, &ecx, &ecx, &edx, NULL); \
-(edx & (1U << 23)) != 0; \
-})
-
-#define cpu_has_sse ({ \
-unsigned int eax = 1, ecx = 0, edx; \
-cpuid(&eax, &ecx, &ecx, &edx, NULL); \
-(edx & (1U << 25)) != 0; \
-})
-
-#define cpu_has_sse2 ({ \
-unsigned int eax = 1, ecx = 0, edx; \
-cpuid(&eax, &ecx, &ecx, &edx, NULL); \
-(edx & (1U << 26)) != 0; \
-})
-
-#define cpu_has_xsave ({ \
-unsigned int eax = 1, ecx = 0; \
-cpuid(&eax, &eax, &ecx, &eax, NULL); \
-/* Intentionally checking OSXSAVE here. */ \
-(ecx & (1U << 27)) != 0; \
-})
-
-static inline uint64_t xgetbv(uint32_t xcr)
-{
-uint32_t lo, hi;
-
-asm ( ".byte 0x0f, 0x01, 0xd0" : "=a" (lo), "=d" (hi) : "c" (xcr) );
-
-return ((uint64_t)hi << 32) | lo;
-}
-
-#define cpu_has_avx ({ \
-unsigned int eax = 1, ecx = 0; \
-cpuid(&eax, &eax, &ecx, &eax, NULL); \
-if ( !(ecx & (1U << 27)) || ((xgetbv(0) & 6) != 6) ) \
-ecx = 0; \
-(ecx & (1U << 28)) != 0; \
-})
-
-#define cpu_has_avx2 ({ \
-unsigned int eax = 1, ebx, ecx = 0; \
-cpuid(&eax, &ebx, &ecx, &eax, NULL); \
-if ( !(ecx & (1U << 27)) || ((xgetbv(0) & 6) != 6) ) \
-ebx = 0; \
-else { \
-eax = 7, ecx = 0; \
-cpuid(&eax, &ebx, &ecx, &eax, NULL); \
-} \
-(ebx & (1U << 5)) != 0; \
-})
-
-static int read_cr(
-unsigned int reg,
-unsigned long *val,
-struct x86_emulate_ctxt *ctxt)
-{
-/* Fake just enough state for the emulator's _get_fpu() to be happy. */
-switch ( reg )
-{
-case 0:
-*val = 0x0001; /* PE */
-return X86EMUL_OKAY;
-
-case 4:
-/* OSFXSR, OSXMMEXCPT, and maybe OSXSAVE */
-*val = 0x0600 | (cpu_has_xsave ? 0x0004 : 0);
-return X86EMUL_OKAY;
-}
-
-return X86EMUL_UNHANDLEABLE;
-}
-
 int get_fpu(
 void (*exception_callback)(void *, struct cpu_user_regs *),
 void *exception_callback_arg,
@@ -221,8 +122,8 @@ static struct x86_emulate_ops emulops = {
 .insn_fetch = fetch,
 .write  = write,
 .cmpxchg= cmpxchg,
-.cpuid  = cpuid,
-.read_cr= read_cr,
+.cpuid  = emul_test_cpuid,
+.read_cr= emul_test_read_cr,
 .get_fpu= get_fpu,
 };
 
diff --git a/tools/tests/x86_emulator/x86_emulate.c 
b/tools/tests/x86_emulator/x86_emulate.c
index 963dd71..8b70580 100644
--- a/tools/tests/x86_emulator/x86_emulate.c
+++ b/tools/tests/x86_emulator/x86_emulate.c
@@ -38,4 +38,43 @@ bool emul_test_make_stack_executable(void)
 MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC) == 0;
 }
 
+int emul_test_cpuid(
+unsigned int *eax,
+unsigned int *ebx,
+unsigned int *ecx,
+unsigned int *edx,
+struct x86_emulate_ctxt *ctxt)
+{
+unsigned int leaf = *eax;
+
+asm ("cpuid" : "+a" (*eax), "+c" (*ecx), "=d" (*edx), "=b" (*ebx));
+
+/* The emulator doesn't itself use MOVBE, so we can always run the test. */
+if ( leaf == 1 )
+*ecx |= 1U << 22;
+
+return X86EMUL_OKAY;
+}
+
+int emul_test_read_cr(
+unsigned int reg,
+unsigned long *val,
+struct x86_emulate_ctxt *ctxt)
+{
+/* Fake just enough state for the emulator's _get_fpu() to be happy. */
+switch ( reg )
+{
+case 0:
+*val = 0x0001; /* PE */
+return X86EMUL_OKAY;
+
+case 4:
+/* OSFXSR, OSXMMEXCPT, and maybe OSXSAVE */
+*val = 0x0600 | (cpu_has_xsave ? 0x0004 : 0);
+return X86EMUL_OKAY;
+}
+
+return X86EMUL_UNHANDLEABLE;
+}
+
 #include "x86_emulate/x86_emu

[Xen-devel] [PATCH v3 0/7] Fuzzing targets for oss-fuzz

2016-12-12 Thread Wei Liu
Hi all

This series adds two fuzzing targets to run in Google's oss-fuzz
infrastructure.

There will be some other patches on the oss-fuzz side. Their recommendation is
to have all the fuzzing targets committed in our tree so that they can be
kept up to date.

Please see the patch to add README for details on how this works.

Wei.

Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 


Wei Liu (7):
  tools/fuzz: introduce libelf target
  x86emul/test: factor out emul_test_make_stack_executable
  x86emul/test: factor out emul_test_{read_cr,cpuid}
  x86emul/test: factor out emul_test_get_fpu
  tools/fuzz: introduce x86 instruction emulator target
  tools: hook up fuzz directory
  tools/fuzz: add README

 .gitignore |   1 +
 tools/Makefile |   1 +
 tools/fuzz/Makefile|  11 ++
 tools/fuzz/README  |  39 +
 tools/fuzz/libelf/Makefile |  31 
 tools/fuzz/libelf/libelf-fuzzer.c  |  32 
 tools/fuzz/x86_instruction_emulator/Makefile   |  31 
 .../x86-insn-emulator-fuzzer.c | 195 +
 tools/tests/x86_emulator/test_x86_emulator.c   | 142 +--
 tools/tests/x86_emulator/x86_emulate.c |  84 +
 tools/tests/x86_emulator/x86_emulate.h |  81 +
 xen/common/libelf/libelf-private.h |   2 +
 12 files changed, 513 insertions(+), 137 deletions(-)
 create mode 100644 tools/fuzz/Makefile
 create mode 100644 tools/fuzz/README
 create mode 100644 tools/fuzz/libelf/Makefile
 create mode 100644 tools/fuzz/libelf/libelf-fuzzer.c
 create mode 100644 tools/fuzz/x86_instruction_emulator/Makefile
 create mode 100644 
tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 6/7] tools: hook up fuzz directory

2016-12-12 Thread Wei Liu
This will make all fuzzing targets get build every time tools directory
is built. This serves as basic regression test.

Signed-off-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
 tools/Makefile  |  1 +
 tools/fuzz/Makefile | 11 +++
 2 files changed, 12 insertions(+)
 create mode 100644 tools/fuzz/Makefile

diff --git a/tools/Makefile b/tools/Makefile
index 71515b4..77e0723 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -6,6 +6,7 @@ SUBDIRS-y += include
 SUBDIRS-y += libs
 SUBDIRS-y += libxc
 SUBDIRS-y += flask
+SUBDIRS-y += fuzz
 SUBDIRS-y += xenstore
 SUBDIRS-y += misc
 SUBDIRS-y += examples
diff --git a/tools/fuzz/Makefile b/tools/fuzz/Makefile
new file mode 100644
index 000..ce00b82
--- /dev/null
+++ b/tools/fuzz/Makefile
@@ -0,0 +1,11 @@
+XEN_ROOT = $(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+SUBDIRS-y :=
+SUBDIRS-y += libelf
+SUBDIRS-y += x86_instruction_emulator
+
+.PHONY: all clean distclean
+all clean distclean: %: subdirs-%
+
+install:
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 1/7] tools/fuzz: introduce libelf target

2016-12-12 Thread Wei Liu
Source code and Makefile to fuzz libelf in Google's oss-fuzz
infrastructure.

Introduce FUZZ_NO_LIBXC in libelf-private.h. That macro will be set when
compiling libelf fuzzer target because libxc is not required in libelf
fuzzing.

Signed-off-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
 tools/fuzz/libelf/Makefile | 31 +++
 tools/fuzz/libelf/libelf-fuzzer.c  | 32 
 xen/common/libelf/libelf-private.h |  2 ++
 3 files changed, 65 insertions(+)
 create mode 100644 tools/fuzz/libelf/Makefile
 create mode 100644 tools/fuzz/libelf/libelf-fuzzer.c

diff --git a/tools/fuzz/libelf/Makefile b/tools/fuzz/libelf/Makefile
new file mode 100644
index 000..0e9d40a
--- /dev/null
+++ b/tools/fuzz/libelf/Makefile
@@ -0,0 +1,31 @@
+XEN_ROOT = $(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+# libelf fuzz target
+vpath %.c ../../../xen/common/libelf
+CFLAGS += -I../../../xen/common/libelf
+ELF_SRCS-y += libelf-tools.c libelf-loader.c libelf-dominfo.c
+ELF_LIB_OBJS := $(patsubst %.c,%.o,$(ELF_SRCS-y))
+
+$(patsubst %.c,%.o,$(ELF_SRCS-y)): CFLAGS += -Wno-pointer-sign
+
+$(ELF_LIB_OBJS): CFLAGS += -DFUZZ_NO_LIBXC $(CFLAGS_xeninclude)
+
+libelf-fuzzer.o: CFLAGS += $(CFLAGS_xeninclude)
+
+libelf.a: $(ELF_LIB_OBJS)
+   $(AR) rc $@ $^
+
+.PHONY: libelf-fuzzer-all
+libelf-fuzzer-all: libelf.a libelf-fuzzer.o
+
+# Common targets
+.PHONY: all
+all: libelf-fuzzer-all
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: clean
+clean:
+   rm -f *.o *.a
diff --git a/tools/fuzz/libelf/libelf-fuzzer.c 
b/tools/fuzz/libelf/libelf-fuzzer.c
new file mode 100644
index 000..71561d3
--- /dev/null
+++ b/tools/fuzz/libelf/libelf-fuzzer.c
@@ -0,0 +1,32 @@
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+struct elf_binary elf_buf, *elf;
+struct elf_dom_parms parms;
+
+elf = &elf_buf;
+
+memset(elf, 0, sizeof(*elf));
+elf_init(elf, (const char *)data, size);
+elf_parse_binary(elf);
+elf_xen_parse(elf, &parms);
+
+return 0;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/libelf/libelf-private.h 
b/xen/common/libelf/libelf-private.h
index 388c3da..47db679 100644
--- a/xen/common/libelf/libelf-private.h
+++ b/xen/common/libelf/libelf-private.h
@@ -72,8 +72,10 @@
 #include 
 #include 
 
+#ifndef FUZZ_NO_LIBXC
 #include "xenctrl.h"
 #include "xc_private.h"
+#endif
 
 #define elf_msg(elf, fmt, args ... )\
 elf_call_log_callback(elf, 0, fmt , ## args );
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 4/7] x86emul/test: factor out emul_test_get_fpu

2016-12-12 Thread Wei Liu
Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/tests/x86_emulator/test_x86_emulator.c | 27 +--
 tools/tests/x86_emulator/x86_emulate.c   | 25 +
 tools/tests/x86_emulator/x86_emulate.h   |  6 ++
 3 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/tools/tests/x86_emulator/test_x86_emulator.c 
b/tools/tests/x86_emulator/test_x86_emulator.c
index e40f0ea..04b8ca6 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -92,31 +92,6 @@ static int cmpxchg(
 return X86EMUL_OKAY;
 }
 
-int get_fpu(
-void (*exception_callback)(void *, struct cpu_user_regs *),
-void *exception_callback_arg,
-enum x86_emulate_fpu_type type,
-struct x86_emulate_ctxt *ctxt)
-{
-switch ( type )
-{
-case X86EMUL_FPU_fpu:
-break;
-case X86EMUL_FPU_mmx:
-if ( cpu_has_mmx )
-break;
-case X86EMUL_FPU_xmm:
-if ( cpu_has_sse )
-break;
-case X86EMUL_FPU_ymm:
-if ( cpu_has_avx )
-break;
-default:
-return X86EMUL_UNHANDLEABLE;
-}
-return X86EMUL_OKAY;
-}
-
 static struct x86_emulate_ops emulops = {
 .read   = read,
 .insn_fetch = fetch,
@@ -124,7 +99,7 @@ static struct x86_emulate_ops emulops = {
 .cmpxchg= cmpxchg,
 .cpuid  = emul_test_cpuid,
 .read_cr= emul_test_read_cr,
-.get_fpu= get_fpu,
+.get_fpu= emul_test_get_fpu,
 };
 
 int main(int argc, char **argv)
diff --git a/tools/tests/x86_emulator/x86_emulate.c 
b/tools/tests/x86_emulator/x86_emulate.c
index 8b70580..a666a32 100644
--- a/tools/tests/x86_emulator/x86_emulate.c
+++ b/tools/tests/x86_emulator/x86_emulate.c
@@ -77,4 +77,29 @@ int emul_test_read_cr(
 return X86EMUL_UNHANDLEABLE;
 }
 
+int emul_test_get_fpu(
+void (*exception_callback)(void *, struct cpu_user_regs *),
+void *exception_callback_arg,
+enum x86_emulate_fpu_type type,
+struct x86_emulate_ctxt *ctxt)
+{
+switch ( type )
+{
+case X86EMUL_FPU_fpu:
+break;
+case X86EMUL_FPU_mmx:
+if ( cpu_has_mmx )
+break;
+case X86EMUL_FPU_xmm:
+if ( cpu_has_sse )
+break;
+case X86EMUL_FPU_ymm:
+if ( cpu_has_avx )
+break;
+default:
+return X86EMUL_UNHANDLEABLE;
+}
+return X86EMUL_OKAY;
+}
+
 #include "x86_emulate/x86_emulate.c"
diff --git a/tools/tests/x86_emulator/x86_emulate.h 
b/tools/tests/x86_emulator/x86_emulate.h
index 4cc3f72..b4d1555 100644
--- a/tools/tests/x86_emulator/x86_emulate.h
+++ b/tools/tests/x86_emulator/x86_emulate.h
@@ -109,3 +109,9 @@ int emul_test_read_cr(
 unsigned int reg,
 unsigned long *val,
 struct x86_emulate_ctxt *ctxt);
+
+int emul_test_get_fpu(
+void (*exception_callback)(void *, struct cpu_user_regs *),
+void *exception_callback_arg,
+enum x86_emulate_fpu_type type,
+struct x86_emulate_ctxt *ctxt);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/7] x86emul/test: factor out emul_test_make_stack_executable

2016-12-12 Thread Wei Liu
It will be used by emulator fuzzing target.

Signed-off-by: Wei Liu 
Acked-by: Jan Beulich 
---
 tools/tests/x86_emulator/test_x86_emulator.c | 12 ++--
 tools/tests/x86_emulator/x86_emulate.c   | 20 
 tools/tests/x86_emulator/x86_emulate.h   |  3 +++
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/tools/tests/x86_emulator/test_x86_emulator.c 
b/tools/tests/x86_emulator/test_x86_emulator.c
index eed8a0d..0d80bff 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -23,8 +23,6 @@ static const struct {
 #endif
 };
 
-#define MMAP_SZ 16384
-
 /* EFLAGS bit definitions. */
 #define EFLG_OF (1<<11)
 #define EFLG_DF (1<<10)
@@ -234,7 +232,6 @@ int main(int argc, char **argv)
 struct cpu_user_regs regs;
 char *instr;
 unsigned int *res, i, j;
-unsigned long sp;
 bool stack_exec;
 int rc;
 #ifndef __x86_64__
@@ -258,13 +255,8 @@ int main(int argc, char **argv)
 }
 instr = (char *)res + 0x100;
 
-#ifdef __x86_64__
-asm ("movq %%rsp, %0" : "=g" (sp));
-#else
-asm ("movl %%esp, %0" : "=g" (sp));
-#endif
-stack_exec = mprotect((void *)(sp & -0x1000L) - (MMAP_SZ - 0x1000),
-  MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC) == 0;
+stack_exec = emul_test_make_stack_executable();
+
 if ( !stack_exec )
 printf("Warning: Stack could not be made executable (%d).\n", errno);
 
diff --git a/tools/tests/x86_emulator/x86_emulate.c 
b/tools/tests/x86_emulator/x86_emulate.c
index 66c2464..963dd71 100644
--- a/tools/tests/x86_emulator/x86_emulate.c
+++ b/tools/tests/x86_emulator/x86_emulate.c
@@ -1,5 +1,7 @@
 #include "x86_emulate.h"
 
+#include 
+
 #define EFER_SCE   (1 << 0)
 #define EFER_LMA   (1 << 10)
 
@@ -18,4 +20,22 @@
 #define get_stub(stb) ((void *)((stb).addr = (uintptr_t)(stb).buf))
 #define put_stub(stb)
 
+bool emul_test_make_stack_executable(void)
+{
+unsigned long sp;
+
+/*
+ * Mark the entire stack executable so that the stub executions
+ * don't fault
+ */
+#ifdef __x86_64__
+asm ("movq %%rsp, %0" : "=g" (sp));
+#else
+asm ("movl %%esp, %0" : "=g" (sp));
+#endif
+
+return mprotect((void *)(sp & -0x1000L) - (MMAP_SZ - 0x1000),
+MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC) == 0;
+}
+
 #include "x86_emulate/x86_emulate.c"
diff --git a/tools/tests/x86_emulator/x86_emulate.h 
b/tools/tests/x86_emulator/x86_emulate.h
index 1981326..a9b874c 100644
--- a/tools/tests/x86_emulator/x86_emulate.h
+++ b/tools/tests/x86_emulator/x86_emulate.h
@@ -33,4 +33,7 @@
 
 #define is_canonical_address(x) (((int64_t)(x) >> 47) == ((int64_t)(x) >> 63))
 
+#define MMAP_SZ 16384
+bool emul_test_make_stack_executable(void);
+
 #include "x86_emulate/x86_emulate.h"
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 7/7] tools/fuzz: add README

2016-12-12 Thread Wei Liu
Signed-off-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
 tools/fuzz/README | 39 +++
 1 file changed, 39 insertions(+)
 create mode 100644 tools/fuzz/README

diff --git a/tools/fuzz/README b/tools/fuzz/README
new file mode 100644
index 000..cf47bf6
--- /dev/null
+++ b/tools/fuzz/README
@@ -0,0 +1,39 @@
+# OVERVIEW
+
+This directory provides fuzzing targets to be run inside Google
+oss-fuzz infrastructure.
+
+See also https://github.com/google/oss-fuzz.
+
+# HOW IT WORKS
+
+We need to provide the source code and the rune to produce objects or
+archives (artefacts) from source code. These items ideally should live
+inside xen.git so that they can be kept up to date.
+
+The artefacts contain all the code we wish to fuzz and a function
+called LLVMFuzzerTestOneInput. LLVMFuzzerTestOneInput is the entry
+point to the code we wish to fuzz. Note that we don't produce
+executable programs because we don't have libFuzzEngine
+locally. libFuzzEngine is maintained by oss-fuzz.
+
+We also provide build script to oss-fuzz. The build script will
+inherit the correct compiler settings and be run in a pre-setup
+environment, which has libFuzzEngine installed. The build script is
+responsible for calling the correct Xen build rune to produce the
+artefacts, then link them against libFuzzEngine to produce
+executables, which will run in oss-fuzz infrastructure.
+
+Please refer to official oss-fuzz documents for the most up-to-date
+descriptions for all moving parts.
+
+# HOW TO IMPROVE THE FUZZING TARGETS
+
+Feel free to modify each fuzzing targets at will. Make sure they build
+by invoking make as you would build tools.
+
+To actually test the new code, you would need to run the target in
+standalone mode, please refer to oss-fuzz documents on how to do that.
+
+It is highly recommended that you run the new target for a while to
+weed out error in plumbing code to avoid false positives.
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 5/7] tools/fuzz: introduce x86 instruction emulator target

2016-12-12 Thread Wei Liu
Instruction emulator fuzzing code is from code previous written by
Andrew and George. Adapted to llvm fuzzer and hook up the build system.

Signed-off-by: Andrew Cooper 
Signed-off-by: George Dunlap 
Signed-off-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 

v3:
1. coding style fix
2. share more code
3. exit when stack can't be made executable
---
 .gitignore |   1 +
 tools/fuzz/x86_instruction_emulator/Makefile   |  31 
 .../x86-insn-emulator-fuzzer.c | 195 +
 3 files changed, 227 insertions(+)
 create mode 100644 tools/fuzz/x86_instruction_emulator/Makefile
 create mode 100644 
tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c

diff --git a/.gitignore b/.gitignore
index a2f34a1..d507243 100644
--- a/.gitignore
+++ b/.gitignore
@@ -145,6 +145,7 @@ tools/flask/utils/flask-loadpolicy
 tools/flask/utils/flask-setenforce
 tools/flask/utils/flask-set-bool
 tools/flask/utils/flask-label-pci
+tools/fuzz/x86_instruction_emulator/x86_emulate*
 tools/helpers/_paths.h
 tools/helpers/init-xenstore-domain
 tools/helpers/xen-init-dom0
diff --git a/tools/fuzz/x86_instruction_emulator/Makefile 
b/tools/fuzz/x86_instruction_emulator/Makefile
new file mode 100644
index 000..2b147ac
--- /dev/null
+++ b/tools/fuzz/x86_instruction_emulator/Makefile
@@ -0,0 +1,31 @@
+XEN_ROOT=$(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+x86-instruction-emulator-fuzzer-all: x86-insn-emulator.a 
x86-insn-emulator-fuzzer.o
+
+x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h:
+   [ -L x86_emulate ] || ln -sf $(XEN_ROOT)/xen/arch/x86/x86_emulate .
+
+x86_emulate.c x86_emulate.h: %:
+   [ -L $* ] || ln -sf $(XEN_ROOT)/tools/tests/x86_emulator/$*
+
+CFLAGS += $(CFLAGS_xeninclude)
+
+x86_emulate.o: x86_emulate.c x86_emulate.h x86_emulate/x86_emulate.c 
x86_emulate/x86_emulate.h
+
+x86-insn-emulator.a: x86_emulate.o
+   $(AR) rc $@ $^
+
+x86-insn-emulator-fuzzer.o: x86-insn-emulator-fuzzer.c
+
+# Common targets
+.PHONY: all
+all: x86-instruction-emulator-fuzzer-all
+
+.PHONY: distclean
+distclean: clean
+   rm -f x86_emulate x86_emulate.c x86_emulate.h
+
+.PHONY: clean
+clean:
+   rm -f *.a *.o
diff --git a/tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c 
b/tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
new file mode 100644
index 000..759f066
--- /dev/null
+++ b/tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
@@ -0,0 +1,195 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "x86_emulate.h"
+
+static unsigned char data[4096];
+static unsigned int data_index = 0;
+static unsigned int data_max;
+
+static int data_read(const char *why, void *dst, unsigned int bytes)
+{
+unsigned i;
+
+if ( data_index + bytes > data_max )
+return X86EMUL_EXCEPTION;
+
+memcpy(dst,  data+data_index, bytes);
+data_index += bytes;
+
+printf("%s: ", why);
+for ( i = 0; i < bytes; i++ )
+printf(" %02x", (unsigned int)*(unsigned char *)(dst+i));
+printf("\n");
+
+return X86EMUL_OKAY;
+}
+
+static int fuzz_read(
+unsigned int seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+struct x86_emulate_ctxt *ctxt)
+{
+return data_read("read", p_data, bytes);
+}
+
+static int fuzz_fetch(
+unsigned int seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+struct x86_emulate_ctxt *ctxt)
+{
+return data_read("fetch", p_data, bytes);
+}
+
+static int fuzz_write(
+unsigned int seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+struct x86_emulate_ctxt *ctxt)
+{
+return X86EMUL_OKAY;
+}
+
+static int fuzz_cmpxchg(
+unsigned int seg,
+unsigned long offset,
+void *old,
+void *new,
+unsigned int bytes,
+struct x86_emulate_ctxt *ctxt)
+{
+return X86EMUL_OKAY;
+}
+
+static struct x86_emulate_ops fuzz_emulops = {
+.read   = fuzz_read,
+.insn_fetch = fuzz_fetch,
+.write  = fuzz_write,
+.cmpxchg= fuzz_cmpxchg,
+.cpuid  = emul_test_cpuid,
+.read_cr= emul_test_read_cr,
+.get_fpu= emul_test_get_fpu,
+};
+
+#define CANONICALIZE(x) \
+do {\
+uint64_t _y = (x);  \
+if ( _y & (1ULL<<47) )  \
+_y |= (~0ULL)<<48;  \
+else\
+_y &= (1ULL<<48)-1; \
+printf("Canonicalized %" PRIx64 " to %" PRIx64 "\n", x, _y);\
+(x) = _y;   

[Xen-devel] [PATCH 0/3] x86emul: misc adjustments

2016-12-12 Thread Jan Beulich
These patches are grouped together merely because of contextual
dependencies.

1: correct EFLAGS.TF handling
2: conditionally clear BNDn for branches
3: some REX related polishing

Signed-off-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Question about porting IPMMU-VMSA Linux driver to XEN

2016-12-12 Thread Oleksandr Tyshchenko
Hi Stefano

On Sat, Dec 10, 2016 at 2:45 AM, Stefano Stabellini
 wrote:
> On Thu, 8 Dec 2016, Oleksandr Tyshchenko wrote:
>> On Thu, Dec 8, 2016 at 9:39 PM, Julien Grall  wrote:
>> >
>> >
>> > On 08/12/16 17:06, Oleksandr Tyshchenko wrote:
>> >>
>> >> Hi Julien,
>> >
>> >
>> > Hi Oleksandr,
>> Hi Julien,
>>
>> thank you for sharing your opinion.
>>
>> >
>> > As discussed on IRC, I CCed xen-devel and Stefano.
>> >
>> >> We would like to hear your opinion about the proper way of porting
>> >> kernel driver to XEN.
>> >> There is a Linux iommu driver "IPMMU VMSA" for supporting
>> >> VMSA-compatible IPMMUs that are integrated in the newest Renesas SoCs.
>> >> Mainline:
>> >> http://lxr.free-electrons.com/source/drivers/iommu/ipmmu-vmsa.c
>> >> But we would prefer to rely on code that hasn't reach upstream yet but
>> >> shipped with BSP for this platform and seems to be more complete:
>> >>
>> >> https://git.kernel.org/cgit/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.6/rcar-3.3.x
>> >>
>> >> For passthrough and future Remote processor (coproc) use-cases to work
>> >> on R-Car Gen3 based platforms we need this driver to be ported to XEN.
>> >> But I am in doubt how to do this in a right way.
>> >>
>> >> So, from my point of view there are two possible ways:
>> >> 1. Try to keep this driver as close as possible from Linux like you
>> >> did for arm-smmu driver. Even keeping the Linux style. I understand
>> >> the main goal despite the overhead and so on.
>> >> 2. Another way is to try to be as similar as possible from arm-smmu
>> >> driver in XEN. In such case many common for both drivers things might
>> >> be moved to the common part.
>> >
>> >
>> > I don't think you would be able to share a lot of code between arm-smmu and
>> > ipmmu-vmsa. The former is using stage-2 page table whilst ipmmu-vmsa is
>> > using stage-1 page table. So the page table would have to be unshared in
>> > your case. This is something we don't yet support on Xen ARM.
>> >
>> > Furthermore, I still want to keep arm-smmu as close as possible from Linux
>> > mainline. When I first implemented the driver I chose to not stick on 
>> > Linux,
>> > but we decided to re-sync few months later because it was too hard to
>> > maintain. One of the main advantage of keeping close to Linux is we can
>> > backport bug more easily.
>> I agree.
>> This is especially important when the driver is "new".
>> I mean when the driver is intensively developed. This leads to bunch of
>> fixes, features that should be taken. So, we will likely move in this
>> direction too.
>>
>> >
>> > I would prefer to see ipmmu-vmsa as close as Linux (BSP or mainline), but
>> > this is not a strong requirement.
>> I got it.
>>
>> > Do you know why the changes you are
>> > looking for are not yet upstreamed? What are the differences?
>> At the moment I don't know why these changes haven't upstreamed yet.
>> They look like features and bug fixes in most cases (multi context
>> support, etc).
>> So, I think it would be better to rely on the BSP at the first time
>> and then re-sync with
>> mainline when these changes reach upstream.
>
> I am not completely against importing a non-upstream driver, but be
> careful because not being upstream means that it hasn't been reviewed as
> much as the upstream counterpart. It might have bugs. Once it reaches
> upstream, it might be different and hard to sync.

I got your point. I will keep in mind it.

Thank you.

-- 
Regards,

Oleksandr Tyshchenko

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/7] tools/fuzz: introduce libelf target

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 10:28,  wrote:
> Source code and Makefile to fuzz libelf in Google's oss-fuzz
> infrastructure.
> 
> Introduce FUZZ_NO_LIBXC in libelf-private.h. That macro will be set when
> compiling libelf fuzzer target because libxc is not required in libelf
> fuzzing.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/7] x86emul/test: factor out emul_test_{read_cr, cpuid}

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 10:28,  wrote:
> While at it, move xgetbv, all cpu_has_* and cache_line_size macros to
> x86_emulate.h.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Jan Beulich 
with one further cosmetic request:

> --- a/tools/tests/x86_emulator/x86_emulate.c
> +++ b/tools/tests/x86_emulator/x86_emulate.c
> @@ -38,4 +38,43 @@ bool emul_test_make_stack_executable(void)
>  MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC) == 0;
>  }
>  
> +int emul_test_cpuid(
> +unsigned int *eax,
> +unsigned int *ebx,
> +unsigned int *ecx,
> +unsigned int *edx,
> +struct x86_emulate_ctxt *ctxt)
> +{
> +unsigned int leaf = *eax;
> +
> +asm ("cpuid" : "+a" (*eax), "+c" (*ecx), "=d" (*edx), "=b" (*ebx));
> +
> +/* The emulator doesn't itself use MOVBE, so we can always run the test. 
> */
> +if ( leaf == 1 )
> +*ecx |= 1U << 22;

The comment here wants some adjustment: "the test" is no longer
applicable. Perhaps "respective tests"?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 4/7] x86emul/test: factor out emul_test_get_fpu

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 10:28,  wrote:
> Signed-off-by: Wei Liu 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxl: fix gentypes call in Makefile

2016-12-12 Thread Wei Liu
On Thu, Nov 10, 2016 at 05:46:00PM +0100, Cédric Bosdonnat wrote:
> From the make documentation:
> 
> "$* [...] If the target is `dir/a.foo.b' and the target pattern is
> `a.%.b' then the stem is `dir/foo'. In a static pattern rule, the
> stem is part of the file name that matched the `%' in the target
> pattern."
> 
> The rule generating the c types files from the idl ones is not
> a static pattern rule, but rather an implicit rule. Thus the value
> of $* is preceded by the file path, instead of only what matches %.
> 
> In order to get this fixed, drop the path using a $(notdir $*).
> 
> Signed-off-by: Cédric Bosdonnat 

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/7] x86emul/test: factor out emul_test_{read_cr, cpuid}

2016-12-12 Thread Wei Liu
On Mon, Dec 12, 2016 at 02:45:45AM -0700, Jan Beulich wrote:
> >>> On 12.12.16 at 10:28,  wrote:
> > While at it, move xgetbv, all cpu_has_* and cache_line_size macros to
> > x86_emulate.h.
> > 
> > Signed-off-by: Wei Liu 
> 
> Reviewed-by: Jan Beulich 
> with one further cosmetic request:
> 
> > --- a/tools/tests/x86_emulator/x86_emulate.c
> > +++ b/tools/tests/x86_emulator/x86_emulate.c
> > @@ -38,4 +38,43 @@ bool emul_test_make_stack_executable(void)
> >  MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC) == 0;
> >  }
> >  
> > +int emul_test_cpuid(
> > +unsigned int *eax,
> > +unsigned int *ebx,
> > +unsigned int *ecx,
> > +unsigned int *edx,
> > +struct x86_emulate_ctxt *ctxt)
> > +{
> > +unsigned int leaf = *eax;
> > +
> > +asm ("cpuid" : "+a" (*eax), "+c" (*ecx), "=d" (*edx), "=b" (*ebx));
> > +
> > +/* The emulator doesn't itself use MOVBE, so we can always run the 
> > test. */
> > +if ( leaf == 1 )
> > +*ecx |= 1U << 22;
> 
> The comment here wants some adjustment: "the test" is no longer
> applicable. Perhaps "respective tests"?
> 

Sure. I will make the change in my branch.

Wei.

> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/7] tools/fuzz: introduce x86 instruction emulator target

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 10:28,  wrote:
> Instruction emulator fuzzing code is from code previous written by
> Andrew and George. Adapted to llvm fuzzer and hook up the build system.
> 
> Signed-off-by: Andrew Cooper 
> Signed-off-by: George Dunlap 
> Signed-off-by: Wei Liu 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> 
> v3:
> 1. coding style fix
> 2. share more code
> 3. exit when stack can't be made executable
> ---
>  .gitignore |   1 +
>  tools/fuzz/x86_instruction_emulator/Makefile   |  31 
>  .../x86-insn-emulator-fuzzer.c | 195 
> +
>  3 files changed, 227 insertions(+)
>  create mode 100644 tools/fuzz/x86_instruction_emulator/Makefile
>  create mode 100644 
> tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
> 
> diff --git a/.gitignore b/.gitignore
> index a2f34a1..d507243 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -145,6 +145,7 @@ tools/flask/utils/flask-loadpolicy
>  tools/flask/utils/flask-setenforce
>  tools/flask/utils/flask-set-bool
>  tools/flask/utils/flask-label-pci
> +tools/fuzz/x86_instruction_emulator/x86_emulate*
>  tools/helpers/_paths.h
>  tools/helpers/init-xenstore-domain
>  tools/helpers/xen-init-dom0
> diff --git a/tools/fuzz/x86_instruction_emulator/Makefile 
> b/tools/fuzz/x86_instruction_emulator/Makefile
> new file mode 100644
> index 000..2b147ac
> --- /dev/null
> +++ b/tools/fuzz/x86_instruction_emulator/Makefile
> @@ -0,0 +1,31 @@
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +x86-instruction-emulator-fuzzer-all: x86-insn-emulator.a 
> x86-insn-emulator-fuzzer.o
> +
> +x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h:
> + [ -L x86_emulate ] || ln -sf $(XEN_ROOT)/xen/arch/x86/x86_emulate .
> +
> +x86_emulate.c x86_emulate.h: %:
> + [ -L $* ] || ln -sf $(XEN_ROOT)/tools/tests/x86_emulator/$*
> +
> +CFLAGS += $(CFLAGS_xeninclude)
> +
> +x86_emulate.o: x86_emulate.c x86_emulate.h x86_emulate/x86_emulate.c 
> x86_emulate/x86_emulate.h

Perhaps worthwhile shortening this to

x86_emulate.o: x86_emulate.[ch] x86_emulate/x86_emulate.[ch]

?

> --- /dev/null
> +++ b/tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
> @@ -0,0 +1,195 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "x86_emulate.h"
> +
> +static unsigned char data[4096];
> +static unsigned int data_index = 0;

Pointless initializer.

> +static unsigned int data_max;
> +
> +static int data_read(const char *why, void *dst, unsigned int bytes)
> +{
> +unsigned i;

Please don't omit the "int" here (and in a few more places below)
when basically everywhere else it is present.

> +if ( data_index + bytes > data_max )
> +return X86EMUL_EXCEPTION;
> +
> +memcpy(dst,  data+data_index, bytes);

Blanks around binary operators please (more further down).

> +data_index += bytes;
> +
> +printf("%s: ", why);
> +for ( i = 0; i < bytes; i++ )
> +printf(" %02x", (unsigned int)*(unsigned char *)(dst+i));

Is the left most cast really needed here?

> +int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
> +{
> +bool stack_exec;
> +struct cpu_user_regs regs = {};
> +struct x86_emulate_ctxt ctxt =
> +{
> +.regs = ®s,
> +.addr_size = 8 * sizeof(void *),
> +.sp_size = 8 * sizeof(void *),
> +};
> +

Stray blank line. The indentation of the initializer above also looks
a little unusual.

> +unsigned nr = 0;
> +int rc;
> +unsigned x;
> +const uint8_t *p = data_p;
> +
> +stack_exec = emul_test_make_stack_executable();
> +if ( !stack_exec )
> +{
> +printf("Warning: Stack could not be made executable (%d).\n", errno);
> +exit(1);
> +}
> +
> +/* Reset all global states */

DYM "state"?

> +memset(data, 0, sizeof(data));
> +data_index = 0;
> +data_max = 0;
> +
> +nr = size < sizeof(regs) ? size : sizeof(regs);
> +
> +memcpy(®s, p, nr);
> +p += sizeof(regs);
> +nr += sizeof(regs);

I think this second += wants to be dropped, considering how nr
gets set above and used below.

> +if ( nr <= size )

< would seem more natural here.

> +{
> +memcpy(data, p, size - nr);
> +data_max = size - nr;
> +}
> +
> +ctxt.force_writeback = 0;

false

> +/* Zero 'private' entries */

s/entries/fields/ ?

> +regs.error_code = 0;
> +regs.entry_vector = 0;
> +
> +/* Use the upper bits of regs.eip to determine addr_size */
> +x = (regs.rip >> ADDR_SIZE_SHIFT) & 0x3;

This won't build as 32-bit code. If that's intentional, then I think
this would better be enforced in the Makefile (r

[Xen-devel] [PATCH 1/3] x86emul: correct EFLAGS.TF handling

2016-12-12 Thread Jan Beulich
For repeated string instructions we should not emulate multiple
iterations in one go when a single step trap needs injecting (which
needs to happen after every iteration).

For all non-branch instructions as well as not taken conditional
branches we additionally need to take DebugCtl.BTF into consideration.

And for mov-to/pop-into %ss there should be no #DB at all (EFLAGS.TF
remaining set means there'll be #DB after the next instruction).

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -415,6 +415,8 @@ typedef union {
 #define MSR_SYSENTER_CS  0x0174
 #define MSR_SYSENTER_ESP 0x0175
 #define MSR_SYSENTER_EIP 0x0176
+#define MSR_DEBUGCTL 0x01d9
+#define DEBUGCTL_BTF (1 << 1)
 #define MSR_EFER 0xc080
 #define MSR_STAR 0xc081
 #define MSR_LSTAR0xc082
@@ -751,6 +753,8 @@ do {
 rc = ops->insn_fetch(x86_seg_cs, ip, NULL, 0, ctxt);\
 if ( rc ) goto done;\
 _regs.eip = ip; \
+if ( _regs.eflags & EFLG_TF )   \
+ctxt->retire.singlestep = true; \
 } while (0)
 
 #define validate_far_branch(cs, ip) ({  \
@@ -767,6 +771,8 @@ do {
 #define commit_far_branch(cs, ip) ({\
 validate_far_branch(cs, ip);\
 _regs.eip = (ip);   \
+if ( _regs.eflags & EFLG_TF )   \
+ctxt->retire.singlestep = true; \
 ops->write_segment(x86_seg_cs, cs, ctxt);   \
 })
 
@@ -948,6 +954,9 @@ static inline void put_loop_count(
 }   \
 goto no_writeback;  \
 }   \
+if ( max_reps > 1 && (_regs.eflags & EFLG_TF) &&\
+ !is_branch_step(ctxt, ops) )   \
+max_reps = 1;   \
 max_reps;   \
 })
 
@@ -1637,6 +1646,16 @@ static bool is_aligned(enum x86_segment
 return !((reg.base + offs) & (size - 1));
 }
 
+static bool is_branch_step(struct x86_emulate_ctxt *ctxt,
+   const struct x86_emulate_ops *ops)
+{
+uint64_t debugctl;
+
+return ops->read_msr &&
+   ops->read_msr(MSR_DEBUGCTL, &debugctl, ctxt) == X86EMUL_OKAY &&
+   (debugctl & DEBUGCTL_BTF);
+}
+
 static bool umip_active(struct x86_emulate_ctxt *ctxt,
 const struct x86_emulate_ops *ops)
 {
@@ -3132,6 +3151,8 @@ x86_emulate(
 goto done;
 
 _regs.eip = imm1;
+if ( _regs.eflags & EFLG_TF )
+ctxt->retire.singlestep = true;
 break;
 
 case 0x9b:  /* wait/fwait */
@@ -4608,6 +4629,8 @@ x86_emulate(
  (rc = ops->write_segment(x86_seg_ss, &sreg, ctxt)) )
 goto done;
 
+if ( ctxt->regs->eflags & EFLG_TF )
+ctxt->retire.singlestep = true;
 break;
 }
 
@@ -4875,6 +4898,8 @@ x86_emulate(
 goto done;
 _regs.esp = lm ? msr_content : (uint32_t)msr_content;
 
+if ( _regs.eflags & EFLG_TF )
+ctxt->retire.singlestep = true;
 break;
 }
 
@@ -4914,6 +4939,9 @@ x86_emulate(
 
 _regs.eip = user64 ? _regs.edx : (uint32_t)_regs.edx;
 _regs.esp = user64 ? _regs.ecx : (uint32_t)_regs.ecx;
+
+if ( _regs.eflags & EFLG_TF )
+ctxt->retire.singlestep = true;
 break;
 }
 
@@ -5400,7 +5428,9 @@ x86_emulate(
 break;
 #endif
 default:
-goto cannot_emulate;
+cannot_emulate:
+rc = X86EMUL_UNHANDLEABLE;
+goto done;
 }
 
 switch ( dst.type )
@@ -5445,7 +5475,8 @@ x86_emulate(
 _regs.eip = (uint32_t)_regs.eip;
 
 /* Was singestepping active at the start of this instruction? */
-if ( (rc == X86EMUL_OKAY) && (ctxt->regs->eflags & EFLG_TF) )
+if ( (rc == X86EMUL_OKAY) && (ctxt->regs->eflags & EFLG_TF) &&
+ !is_branch_step(ctxt, ops) && !ctxt->retire.mov_ss )
 ctxt->retire.singlestep = true;
 
 *ctxt->regs = _regs;
@@ -5461,12 +5492,17 @@ x86_emulate(
  done:
 _put_fpu();
 put_stub(stub);
-return rc;
 
- cannot_emulate:
-_put_fpu();
-put_stub(stub);
-return X86EMUL_UNHANDLEABLE;
+/*
+ * We may have set the single step flag ahead of the last possible point
+ * of failure (unavoidably with the current near CALL code flow, but also
+ * used on some far branch paths to keep the code

[Xen-devel] [PATCH 2/3 v2] x86emul: conditionally clear BNDn for branches

2016-12-12 Thread Jan Beulich
Considering that we surface MPX to HVM guests, instructions we emulate
should also correctly deal with MPX state. While for now BND*
instructions don't get emulated, the effect of branches (which we do
emulate) without BND prefix should be taken care of.

No need to alter XABORT behavior: While not mentioned in the SDM so
far, this restores BNDn as they were at the XBEGIN, and since we make
XBEGIN abort right away, XABORT in the emulator is only a no-op.

Signed-off-by: Jan Beulich 
---
v2: Re-base. Address all RFC reasons based on feedback from Intel.
Re-work the actual clearing of BNDn.

--- a/tools/tests/x86_emulator/x86_emulate.c
+++ b/tools/tests/x86_emulator/x86_emulate.c
@@ -5,6 +5,8 @@
 
 #define cpu_has_amd_erratum(nr) 0
 #define mark_regs_dirty(r) ((void)(r))
+#define read_bndcfgu() 0
+#define xstate_set_init(what)
 
 /* For generic assembly code: use macros to define operation/operand sizes. */
 #ifdef __i386__
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -421,6 +421,8 @@ int vcpu_initialise(struct vcpu *v)
 
 vmce_init_vcpu(v);
 }
+else if ( (rc = xstate_alloc_save_area(v)) != 0 )
+return rc;
 
 spin_lock_init(&v->arch.vpmu.vpmu_lock);
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -417,6 +417,9 @@ typedef union {
 #define MSR_SYSENTER_EIP 0x0176
 #define MSR_DEBUGCTL 0x01d9
 #define DEBUGCTL_BTF (1 << 1)
+#define MSR_BNDCFGS  0x0d90
+#define BNDCFG_ENABLE(1 << 0)
+#define BNDCFG_PRESERVE  (1 << 1)
 #define MSR_EFER 0xc080
 #define MSR_STAR 0xc081
 #define MSR_LSTAR0xc082
@@ -1295,6 +1298,7 @@ static bool vcpu_has(
 #define vcpu_has_bmi1()vcpu_has( 7, EBX,  3, ctxt, ops)
 #define vcpu_has_hle() vcpu_has( 7, EBX,  4, ctxt, ops)
 #define vcpu_has_rtm() vcpu_has( 7, EBX, 11, ctxt, ops)
+#define vcpu_has_mpx() vcpu_has( 7, EBX, 14, ctxt, ops)
 
 #define vcpu_must_have(feat) \
 generate_exception_if(!vcpu_has_##feat(), EXC_UD)
@@ -1791,6 +1795,34 @@ static int inject_swint(enum x86_swint_t
 generate_exception(fault_type, error_code);
 }
 
+static void clear_bnd(struct x86_emulate_ctxt *ctxt,
+  const struct x86_emulate_ops *ops, enum vex_pfx pfx)
+{
+uint64_t bndcfg;
+int rc;
+
+if ( pfx == vex_f2 || !vcpu_has_mpx() )
+return;
+
+if ( !mode_ring0() )
+bndcfg = read_bndcfgu();
+else if ( !ops->read_msr ||
+  ops->read_msr(MSR_BNDCFGS, &bndcfg, ctxt) != X86EMUL_OKAY )
+return;
+if ( (bndcfg & BNDCFG_ENABLE) && !(bndcfg & BNDCFG_PRESERVE) )
+{
+/*
+ * Using BNDMK or any other MPX instruction here is pointless, as
+ * we run with MPX disabled ourselves, and hence they're all no-ops.
+ * Therefore we have two ways to clear BNDn: Enable MPX temporarily
+ * (in which case executing any suitable non-prefixed branch
+ * instruction would do), or use XRSTOR.
+ */
+xstate_set_init(XSTATE_BNDREGS);
+}
+ done:;
+}
+
 int x86emul_unhandleable_rw(
 enum x86_segment seg,
 unsigned long offset,
@@ -2975,6 +3007,7 @@ x86_emulate(
 case 0x70 ... 0x7f: /* jcc (short) */
 if ( test_cc(b, _regs.eflags) )
 jmp_rel((int32_t)src.val);
+clear_bnd(ctxt, ops, vex.pfx);
 break;
 
 case 0x82: /* Grp1 (x86/32 only) */
@@ -3340,6 +3373,7 @@ x86_emulate(
  (rc = ops->insn_fetch(x86_seg_cs, dst.val, NULL, 0, ctxt)) )
 goto done;
 _regs.eip = dst.val;
+clear_bnd(ctxt, ops, vex.pfx);
 break;
 
 case 0xc4: /* les */ {
@@ -4059,12 +4093,15 @@ x86_emulate(
 op_bytes = ((op_bytes == 4) && mode_64bit()) ? 8 : op_bytes;
 src.val = _regs.eip;
 jmp_rel(rel);
+clear_bnd(ctxt, ops, vex.pfx);
 goto push;
 }
 
 case 0xe9: /* jmp (near) */
 case 0xeb: /* jmp (short) */
 jmp_rel((int32_t)src.val);
+if ( !(b & 2) )
+clear_bnd(ctxt, ops, vex.pfx);
 break;
 
 case 0xea: /* jmp (far, absolute) */
@@ -4323,12 +4360,14 @@ x86_emulate(
 goto done;
 _regs.eip = src.val;
 src.val = dst.val;
+clear_bnd(ctxt, ops, vex.pfx);
 goto push;
 case 4: /* jmp (near) */
 if ( (rc = ops->insn_fetch(x86_seg_cs, src.val, NULL, 0, ctxt)) )
 goto done;
 _regs.eip = src.val;
 dst.type = OP_NONE;
+clear_bnd(ctxt, ops, vex.pfx);
 break;
 case 3: /* call (far, absolute indirect) */
 case 5: /* jmp (far, absolute indirect) */
@@ -5047,6 +5086,7 @@ x86_emulate(
 case X86EMUL_OPC(0x0f, 0x80) ... X86EMUL_OPC(0x0f, 0x8f): /* jcc (near) */
 if ( test_cc(b, _regs.eflags) )
 jmp_rel((int32_t)src.val);
+clear_bnd(ctxt, ops, ve

[Xen-devel] [PATCH 3/3] x86emul: some REX related polishing

2016-12-12 Thread Jan Beulich
While there are a few cases where it seems better to open-code REX_*
values, there's one where this clearly is a bad idea. And the SYSEXIT
emulation has no need to look at REX at all, it can simply use op_bytes
instead.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3082,7 +3082,7 @@ x86_emulate(
 
 case 0x90: /* nop / xchg %%r8,%%rax */
 case X86EMUL_OPC_F3(0, 0x90): /* pause / xchg %%r8,%%rax */
-if ( !(rex_prefix & 1) )
+if ( !(rex_prefix & REX_B) )
 break; /* nop / pause */
 /* fall through */
 
@@ -4897,7 +4897,6 @@ x86_emulate(
 case X86EMUL_OPC(0x0f, 0x35): /* sysexit */
 {
 uint64_t msr_content;
-bool user64 = rex_prefix & REX_W;
 
 generate_exception_if(!mode_ring0(), EXC_GP, 0);
 generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
@@ -4907,16 +4906,17 @@ x86_emulate(
 goto done;
 
 generate_exception_if(!(msr_content & 0xfffc), EXC_GP, 0);
-generate_exception_if(user64 && (!is_canonical_address(_regs.edx) ||
- !is_canonical_address(_regs.ecx)),
+generate_exception_if(op_bytes == 8 &&
+  (!is_canonical_address(_regs.edx) ||
+   !is_canonical_address(_regs.ecx)),
   EXC_GP, 0);
 
 cs.sel = (msr_content | 3) + /* SELECTOR_RPL_MASK */
- (user64 ? 32 : 16);
+ (op_bytes == 8 ? 32 : 16);
 cs.base = 0;   /* flat segment */
 cs.limit = ~0u;  /* 4GB limit */
-cs.attr.bytes = user64 ? 0xafb  /* L+DB+P+DPL3+S+Code */
-   : 0xcfb; /* G+DB+P+DPL3+S+Code */
+cs.attr.bytes = op_bytes == 8 ? 0xafb  /* L+DB+P+DPL3+S+Code */
+  : 0xcfb; /* G+DB+P+DPL3+S+Code */
 
 sreg.sel = cs.sel + 8;
 sreg.base = 0;   /* flat segment */
@@ -4928,8 +4928,8 @@ x86_emulate(
  (rc = ops->write_segment(x86_seg_ss, &sreg, ctxt)) != 0 )
 goto done;
 
-_regs.eip = user64 ? _regs.edx : (uint32_t)_regs.edx;
-_regs.esp = user64 ? _regs.ecx : (uint32_t)_regs.ecx;
+_regs.eip = op_bytes == 8 ? _regs.edx : (uint32_t)_regs.edx;
+_regs.esp = op_bytes == 8 ? _regs.ecx : (uint32_t)_regs.ecx;
 
 if ( _regs.eflags & EFLG_TF )
 ctxt->retire.singlestep = true;



x86emul: some REX related polishing

While there are a few cases where it seems better to open-code REX_*
values, there's one where this clearly is a bad idea. And the SYSEXIT
emulation has no need to look at REX at all, it can simply use op_bytes
instead.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3082,7 +3082,7 @@ x86_emulate(
 
 case 0x90: /* nop / xchg %%r8,%%rax */
 case X86EMUL_OPC_F3(0, 0x90): /* pause / xchg %%r8,%%rax */
-if ( !(rex_prefix & 1) )
+if ( !(rex_prefix & REX_B) )
 break; /* nop / pause */
 /* fall through */
 
@@ -4897,7 +4897,6 @@ x86_emulate(
 case X86EMUL_OPC(0x0f, 0x35): /* sysexit */
 {
 uint64_t msr_content;
-bool user64 = rex_prefix & REX_W;
 
 generate_exception_if(!mode_ring0(), EXC_GP, 0);
 generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
@@ -4907,16 +4906,17 @@ x86_emulate(
 goto done;
 
 generate_exception_if(!(msr_content & 0xfffc), EXC_GP, 0);
-generate_exception_if(user64 && (!is_canonical_address(_regs.edx) ||
- !is_canonical_address(_regs.ecx)),
+generate_exception_if(op_bytes == 8 &&
+  (!is_canonical_address(_regs.edx) ||
+   !is_canonical_address(_regs.ecx)),
   EXC_GP, 0);
 
 cs.sel = (msr_content | 3) + /* SELECTOR_RPL_MASK */
- (user64 ? 32 : 16);
+ (op_bytes == 8 ? 32 : 16);
 cs.base = 0;   /* flat segment */
 cs.limit = ~0u;  /* 4GB limit */
-cs.attr.bytes = user64 ? 0xafb  /* L+DB+P+DPL3+S+Code */
-   : 0xcfb; /* G+DB+P+DPL3+S+Code */
+cs.attr.bytes = op_bytes == 8 ? 0xafb  /* L+DB+P+DPL3+S+Code */
+  : 0xcfb; /* G+DB+P+DPL3+S+Code */
 
 sreg.sel = cs.sel + 8;
 sreg.base = 0;   /* flat segment */
@@ -4928,8 +4928,8 @@ x86_emulate(
  (rc = ops->write_segment(x86_seg_ss, &sreg, ctxt)) != 0 )
 goto done;
 
-_regs.eip = user64 ? _regs.edx : (uint32_t)_regs.edx;
-_regs.esp = user64 ? _regs.ecx : (uint32_t)_regs.ecx;
+_regs.eip = op_bytes == 8 ? _regs.edx : (uint32_t)_regs.edx;
+_regs.esp = op_bytes == 8 ? _regs.ecx : (uint32_t)_regs.ecx;
 
 if ( _regs.eflags & EFLG

Re: [Xen-devel] [PATCH] x86/time: don't omit newline in dump_softtsc()

2016-12-12 Thread Andrew Cooper
On 12/12/16 08:08, Jan Beulich wrote:
> Reported-by: Anton Samsonov 
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86emul: consolidate string insn address increments

2016-12-12 Thread Andrew Cooper
On 12/12/16 07:27, Jan Beulich wrote:
 On 09.12.16 at 18:11,  wrote:
>> On 09/12/16 15:22, Jan Beulich wrote:
>>> Move the looking at EFLAGS.DF into the macro, rendering all call sites
>>> more readable.
>>>
>>> Signed-off-by: Jan Beulich 
>> The net change is ok; it is certainly cleaner to read in the body of
>> x86_emulate().
>>
>> However, the naming of register_address_increment() was previously ok,
>> as it was obvious at the calling point that a negative increment was
>> possible.  This subtly is now hidden.
>>
>> How about reg_addr_adjust() or reg_addr_adjust_dir() as an alternative
>> name?  This retains the property that it is obvious that the direction
>> flag is followed in the calculation.
> That's fine with me; I admit that I've overlooked that the use of
> "increment" in the name has now become further disconnected
> from the actual operation. In line with
> _register_address_increment() I'd prefer to call it
> _register_address_adjust() though; in fact it might be reasonable
> to also replace "increment" with "adjust" in the other one (as it's
> similarly used for adjustments in both directions). What do you
> think?

register_address_adjust() was my first alternative name, so ok.

With that changed, Revewed-by: Andrew Cooper 
to save another trip on email.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.8-testing test] 103160: tolerable FAIL - PUSHED

2016-12-12 Thread osstest service owner
flight 103160 xen-4.8-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103160/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64 14 guest-saverestore.2 fail in 
103103 pass in 103160
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail in 103103 pass 
in 103160
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail in 103103 pass in 
103160
 test-armhf-armhf-libvirt  7 host-ping-check-xenfail pass in 103103
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail pass in 
103103
 test-amd64-i386-xl   19 guest-start/debian.repeat  fail pass in 103103
 test-amd64-amd64-xl-xsm  19 guest-start/debian.repeat  fail pass in 103103
 test-armhf-armhf-xl-vhd  10 guest-startfail pass in 103103
 test-armhf-armhf-xl-cubietruck 15 guest-start/debian.repeat fail pass in 103103
 test-amd64-amd64-libvirt-vhd 16 guest-start/debian.repeat  fail pass in 103103

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  9 debian-install   fail REGR. vs. 103036
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 102998
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 102998
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 103036
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 103036

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt12 migrate-support-check fail in 103103 never pass
 test-armhf-armhf-libvirt 13 saverestore-support-check fail in 103103 never pass
 test-armhf-armhf-xl-vhd 11 migrate-support-check fail in 103103 never pass
 test-armhf-armhf-xl-vhd 12 saverestore-support-check fail in 103103 never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  7967dafe6acce66193a8a81fa88ac4d3eb7b48aa
baseline version:
 xen  1f4ea1603570a91c486f2cd26c819d076f260f30

Last test of basis   103036  2016-12-07 16:58:05 Z4 days
Testing same since   103103  2016-12-08 19:26:08 Z3 days2 attempts


People who touched revisions under test:
  Ian Jackson 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf 

[Xen-devel] [PATCH] x86/emul: Annotate more intentional fallthrough cases

2016-12-12 Thread Andrew Cooper
Some recent change in x86_emulate.c has simplified the callgraph sufficiently
for Coverity to notice these, rather than hitting its upper path limit.

All are legitimate fallthoughs.  Annotate them as such.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/x86_emulate/x86_emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c 
b/xen/arch/x86/x86_emulate/x86_emulate.c
index 4a12302..75a3585 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2644,8 +2644,8 @@ x86_emulate(
 ea.mem.off += (src.val >> 3) & ~(op_bytes - 1L);
 src.val &= (op_bytes << 3) - 1;
 }
-/* Becomes a normal DstMem operation from here on. */
 d = (d & ~DstMask) | DstMem;
+/* Becomes a normal DstMem operation from here on. */
 case DstMem:
 ea.bytes = (d & ByteOp) ? 1 : op_bytes;
 dst = ea;
@@ -2987,6 +2987,7 @@ x86_emulate(
 
 case 0x82: /* Grp1 (x86/32 only) */
 generate_exception_if(mode_64bit(), EXC_UD);
+/* Fallthrough. */
 case 0x80: case 0x81: case 0x83: /* Grp1 */
 switch ( modrm_reg & 7 )
 {
@@ -4314,6 +4315,7 @@ x86_emulate(
 
 case 0xfe: /* Grp4 */
 generate_exception_if((modrm_reg & 7) >= 2, EXC_UD);
+/* Fallthough. */
 case 0xff: /* Grp5 */
 switch ( modrm_reg & 7 )
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86/smpboot: Make logical package management more robust

2016-12-12 Thread Thomas Gleixner
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
---
 arch/x86/kernel/apic/apic.c  |   15 
 arch/x86/kernel/cpu/common.c |   24 ++--
 arch/x86/kernel/smpboot.c|   51 ---
 arch/x86/xen/smp.c   |6 -
 4 files changed, 27 insertions(+), 69 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2159,21 +2159,6 @@ int __generic_processor_info(int apicid,
}
 
/*
-* This can happen on physical hotplug. The sanity check at boot time
-* is done from native_smp_prepare_cpus() after num_possible_cpus() is
-* established.
-*/
-   if (topology_update_package_map(apicid, cpu) < 0) {
-   int thiscpu = max + disabled_cpus;
-
-   pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
-  thiscpu, apicid);
-
-   disabled_cpus++;
-   return -ENOSPC;
-   }
-
-   /*
 * Validate version
 */
if (version == 0x0) {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cp
 }
 
 /*
- * The physical to logical package id mapping is initialized from the
- * acpi/mptables information. Make sure that CPUID actually agrees with
- * that.
+ * Validate that ACPI/mptables have the same information about the
+ * effective APIC id and update the package map.
  */
-static void sanitize_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-   unsigned int pkg, apicid, cpu = smp_processor_id();
+   unsigned int apicid, cpu = smp_processor_id();
 
apicid = apic->cpu_present_to_apicid(cpu);
-   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
 
-   if (apicid != c->initial_apicid) {
-   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+   if (apicid != c->apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: 
%x\n",
   cpu, apicid, c->initial_apicid);
-   c->initial_apicid = apicid;
}
-   if (pkg != c->phys_proc_id) {
-   pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of 
%u\n",
-  cpu, pkg, c->phys_proc_id);
-   c->phys_proc_id = pkg;
-   }
-   c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
+   BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
 #else
c->logical_proc_id = 0;
 #endif
@@ -1132,7 +1124,6 @@ static void identify_cpu(struct cpuinfo_
 #ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
 #endif
-   sanitize_package_id(c);
 }
 
 /*
@@ -1188,6 +1179,7 @@ void identify_secondary_cpu(struct cpuin
enable_sep_cpu();
 #endif
mtrr_ap_init();
+   validate_apic_and_package_id(c);
 }
 
 struct msr_range {
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -104,7 +104,6 @@ static unsigned int max_physical_pkg_id
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
-static bool logical_packages_frozen __read_mostly;
 
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
@@ -263,9 +262,14 @@ static void notrace start_secondary(void
cpu_startup_entry

Re: [Xen-devel] [PATCH] Fix errornous negation for isstubdom, which breaks HVM pci-passthrough.

2016-12-12 Thread Sander Eikelenboom
On 12/12/16 10:14, Wei Liu wrote:
> On Sat, Dec 10, 2016 at 06:59:08PM +0100, Sander Eikelenboom wrote:
>> Commit 20b75251d9721d9c050a973c02baac396c794ade introduced an errornous
>> negation which gave the isstubdom bool the opposite semantics,
>> causing the subsequent code to take the wrong code path.
>>
>> Signed-off-by: Sander Eikelenboom 
> 
> Acked-by: Wei Liu 
> 
> Though I would like to shorten the title a bit. That is, I will move the
> clause "which ..." to the end of the commit log.

Sure, thanks for taking care !

--
Sander

>   Commit 20b75251d9721d9c050a973c02baac396c794ade introduced an
>   erroneous negation which gave the isstubdom bool the opposite semantics,
>   causing the subsequent code to take the wrong code path, which breaks
>   ...
> 
>> ---
>>  tools/libxl/libxl_pci.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
>> index 3b707f3..8395352 100644
>> --- a/tools/libxl/libxl_pci.c
>> +++ b/tools/libxl/libxl_pci.c
>> @@ -1001,7 +1001,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
>> libxl_device_pci *pcidev, i
>>  int irq, i, rc, hvm = 0;
>>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>  uint32_t domainid = domid;
>> -bool isstubdom = !libxl_is_stubdom(ctx, domid, &domainid);
>> +bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
>>  
>>  if (type == LIBXL_DOMAIN_TYPE_INVALID)
>>  return ERROR_FAIL;
>> -- 
>> 2.1.4
>>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4] console: allow log level threshold adjustments

2016-12-12 Thread Jan Beulich
... from serial console so that one doesn't always need to reboot to
see more / less messages.

Note that upper thresholds are sticky, i.e. while they get adjusted
upwards when the lower threshold would otherwise end up above the upper
one, they don't get adjusted when reducing the lower one. Full
flexibility is available only via a future sysctl interface.

Note further that (meaningless) large threshold values aren't being
rejected, for the sake of not adding more checks to the code than are
really necessary for safe operation.

Signed-off-by: Jan Beulich 
---
v4: Split out sysctl.
v3: Mention apparently missing threshold upper bound checks in
description. Make flask_sysctl() accept the new operation.
v2: Add sysctl.

--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -175,7 +175,7 @@ static void __init parse_guest_loglvl(ch
 _parse_loglvl(s, &xenlog_guest_lower_thresh, &xenlog_guest_upper_thresh);
 }
 
-static char * __init loglvl_str(int lvl)
+static char *loglvl_str(int lvl)
 {
 switch ( lvl )
 {
@@ -188,6 +188,50 @@ static char * __init loglvl_str(int lvl)
 return "???";
 }
 
+static int *__read_mostly upper_thresh_adj = &xenlog_upper_thresh;
+static int *__read_mostly lower_thresh_adj = &xenlog_lower_thresh;
+static const char *__read_mostly thresh_adj = "standard";
+
+static void do_toggle_guest(unsigned char key, struct cpu_user_regs *regs)
+{
+if ( upper_thresh_adj == &xenlog_upper_thresh )
+{
+upper_thresh_adj = &xenlog_guest_upper_thresh;
+lower_thresh_adj = &xenlog_guest_lower_thresh;
+thresh_adj = "guest";
+}
+else
+{
+upper_thresh_adj = &xenlog_upper_thresh;
+lower_thresh_adj = &xenlog_lower_thresh;
+thresh_adj = "standard";
+}
+printk("'%c' pressed -> %s log level adjustments enabled\n",
+   key, thresh_adj);
+}
+
+static void do_adj_thresh(unsigned char key)
+{
+if ( *upper_thresh_adj < *lower_thresh_adj )
+*upper_thresh_adj = *lower_thresh_adj;
+printk("'%c' pressed -> %s log level: %s (rate limited %s)\n",
+   key, thresh_adj, loglvl_str(*lower_thresh_adj),
+   loglvl_str(*upper_thresh_adj));
+}
+
+static void do_inc_thresh(unsigned char key, struct cpu_user_regs *regs)
+{
+++*lower_thresh_adj;
+do_adj_thresh(key);
+}
+
+static void do_dec_thresh(unsigned char key, struct cpu_user_regs *regs)
+{
+if ( *lower_thresh_adj )
+--*lower_thresh_adj;
+do_adj_thresh(key);
+}
+
 /*
  * 
  * *** ACCESS TO CONSOLE RING *
@@ -816,6 +860,12 @@ void __init console_endboot(void)
 
 register_keyhandler('w', dump_console_ring_key,
 "synchronously dump console ring buffer (dmesg)", 0);
+register_irq_keyhandler('+', &do_inc_thresh,
+"increase log level threshold", 0);
+register_irq_keyhandler('-', &do_dec_thresh,
+"decrease log level threshold", 0);
+register_irq_keyhandler('G', &do_toggle_guest,
+"toggle host/guest log level adjustment", 0);
 
 /* Serial input is directed to DOM0 by default. */
 switch_serial_input();



console: allow log level threshold adjustments

... from serial console so that one doesn't always need to reboot to
see more / less messages.

Note that upper thresholds are sticky, i.e. while they get adjusted
upwards when the lower threshold would otherwise end up above the upper
one, they don't get adjusted when reducing the lower one. Full
flexibility is available only via a future sysctl interface.

Note further that (meaningless) large threshold values aren't being
rejected, for the sake of not adding more checks to the code than are
really necessary for safe operation.

Signed-off-by: Jan Beulich 
---
v4: Split out sysctl.
v3: Mention apparently missing threshold upper bound checks in
description. Make flask_sysctl() accept the new operation.
v2: Add sysctl.

--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -175,7 +175,7 @@ static void __init parse_guest_loglvl(ch
 _parse_loglvl(s, &xenlog_guest_lower_thresh, &xenlog_guest_upper_thresh);
 }
 
-static char * __init loglvl_str(int lvl)
+static char *loglvl_str(int lvl)
 {
 switch ( lvl )
 {
@@ -188,6 +188,50 @@ static char * __init loglvl_str(int lvl)
 return "???";
 }
 
+static int *__read_mostly upper_thresh_adj = &xenlog_upper_thresh;
+static int *__read_mostly lower_thresh_adj = &xenlog_lower_thresh;
+static const char *__read_mostly thresh_adj = "standard";
+
+static void do_toggle_guest(unsigned char key, struct cpu_user_regs *regs)
+{
+if ( upper_thresh_adj == &xenlog_upper_thresh )
+{
+upper_thresh_adj = &xenlog_guest_upper_thresh;
+lower_thresh_adj = &xenlog_guest_lower_thresh;
+thresh_adj = "guest";
+}
+else
+{
+upper_thresh_adj = &xenlog_upper_th

Re: [Xen-devel] [PATCH] x86/emul: Annotate more intentional fallthrough cases

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 11:05,  wrote:
> @@ -4314,6 +4315,7 @@ x86_emulate(
>  
>  case 0xfe: /* Grp4 */
>  generate_exception_if((modrm_reg & 7) >= 2, EXC_UD);
> +/* Fallthough. */

With the missing r added,
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] console: allow log level threshold adjustments

2016-12-12 Thread Wei Liu
On Mon, Dec 12, 2016 at 03:16:27AM -0700, Jan Beulich wrote:
> ... from serial console so that one doesn't always need to reboot to
> see more / less messages.
> 
> Note that upper thresholds are sticky, i.e. while they get adjusted
> upwards when the lower threshold would otherwise end up above the upper
> one, they don't get adjusted when reducing the lower one. Full
> flexibility is available only via a future sysctl interface.
> 
> Note further that (meaningless) large threshold values aren't being
> rejected, for the sake of not adding more checks to the code than are
> really necessary for safe operation.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 6/9] x86/SVM: Add AVIC vmexit handlers

2016-12-12 Thread Suravee Suthikulpanit

Hi Konrad,

Thanks for review comments.

On 10/14/2016 10:20 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Sep 19, 2016 at 12:52:45AM -0500, Suravee Suthikulpanit wrote:

AVIC introduces two #vmexit handlers:

+ * IPIs when the specified Message Type is Fixed
+ * (also known as fixed delivery mode) and
+ * the Trigger Mode is edge-triggered. The hardware
+ * also supports self and broadcast delivery modes
+ * specified via the Destination Shorthand(DSH)
+ * field of the ICRL. Logical and physical APIC ID
+ * formats are supported. All other IPI types cause
+ * a #VMEXIT, which needs to emulated.
+ */
+vlapic_reg_write(v, APIC_ICR2, icrh);
+vlapic_reg_write(v, APIC_ICR, icrl);
+break;
+case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
+{
+/*
+ * At this point, we expect that the AVIC HW has already
+ * set the appropriate IRR bits on the valid target
+ * vcpus. So, we just need to kick the appropriate vcpu.


Is there a pattern to this? Meaning does the CPU go sequentially
through the logical APIC ids - which (if say the guest is using
logical APIC ids which gives you pretty much 1-1 to physical) - means
that this VMEXIT would occur in a sequence of the vCPUs that are not
running?


Not sure if I follow this part. Typically, the current vcpu (the one 
that handles this #vmexit) is the one initiating IPI. Here we check the 
destination and try to kick each one of the matching destination.



Because if so, then ..

+
+for_each_vcpu ( d, c )


.. you could optimize this a bit and start at vCPU+1 (since you know
that this current vCPU most certainyl got the vCPU) ?



But the current (running) vcpu is not necessary the d->vcpu[0]. Do you 
mean checking if "c == current" in this loop and skip the current vcpu?



+break;
+case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
+dprintk(XENLOG_ERR,
+"SVM: %s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
+__func__, icrh, icrl, index);


That should never happen right? If it does that means we messed up the
allocation somehow. If so, should we disable AVIC for this guest
completly and log this error?


I think it depends on when this happens. It might be hard to determine 
the state of all VCPUs at that point. Shouldn't we just crash the domain 
(probably for AVIC_INCMP_IPI_ERR_INV_TARGET and default case as well)?



[... in arch/x86/hvm/svm/avic.c: avic_handle_dfr_update()]
+u32 mod = (dfr >> 28) & 0xf;
+
+/*
+ * We assume that all local APICs are using the same type.
+ * If this changes, we need to flush the AVIC logical
+ * APID id table.
+ */
+if ( d->ldr_mode == mod )
+return 0;
+
+clear_domain_page(_mfn(d->avic_log_ait_mfn));


Whoa. What if another vCPU is using this (aka handling another AVIC
access?)?


Generally, at least in Linux case, I can see that the kernel switch APIC 
routing from cluster to flat early in the boot process prior to setting 
LDR and bringing up the rest of the AP cores). Do you know of any cases 
where the guest OS could be switching the APIC routing mode while the AP 
cores are running?



I see that that the 'V' bit would be cleared so the CPU
would trap .. (thought that depends right - the clear_domain_page is
being done in a sequence so some of the later entries could be valid
while the CPU is clearing it).


Which "V" bit are you referring to here? And when is it cleared?


Perhaps you should iterate over all the 'V' bit first (to clear them)
and then clear the page using the clear_domain_page?
Or better - turn of AVIC first (for the guest), until the page has been cleared?


What about adding a check to see if DFR is changed after LDR has already 
been setup and throw some error or warning message for now?



[... in arch/x86/hvm/svm/avic.c: svm_avic_vmexit_do_noaccel()]
+BUG();
+avic_unaccel_trap_write(v);


Say the values are all messed up (the guest provides the wrong
APIC ID).. Shouldn't we inject some type of exception in the guest?
(Like the hardware would do? Actually - would the hardware send any type
of interrupt if one messes up with the APIC? Or is it that it sets an
error bit in the APIC?)


IIUC, typically, APIC generates APIC error interrupts and set the APIC 
Error Status Register (ESR) when detect errors during interrupt 
handling. However, if the APIC registers are not programmed correctly, I 
think the system would just fail to boot and go into undefined state.



[...]
+if ( !rw )
+*avic_get_bk_page_entry(v, offset) = vlapic_read_aligned(
+vcpu_vlapic(v), 
offset);
+
+hvm_mem_access_emulate_one(EMUL_KIND_NORMAL, TRAP_invalid_op,
+   HVM_DELIVER_NO_ERROR_CODE);


So we update the backing page .. and then inject an invalid_op in the
guest? Is that how hardware does it?



[Xen-devel] [PATCH] X86/VPMU:clear the overflow status of which counter happened overflow

2016-12-12 Thread Luwei Kang
If a counter happend overflow just clear corresponding bit of
IA32_PERF_GLOBAL_OVF_CTRL, rather than clear all the bit of this msr.

Signed-off-by: Luwei Kang 
---
 xen/arch/x86/cpu/vpmu_intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index e8049ed..613aafe 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -868,7 +868,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
*regs)
 if ( is_pmc_quirk )
 handle_pmc_quirk(msr_content);
 core2_vpmu_cxt->global_status |= msr_content;
-msr_content = ~global_ovf_ctrl_mask;
+msr_content &= ~global_ovf_ctrl_mask;
 wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
 }
 else
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-12 Thread Andre Przywara
Hi Stefano,

thanks for the prompt and helpful answer!

On 10/12/16 00:30, Stefano Stabellini wrote:
> On Fri, 9 Dec 2016, Andre Przywara wrote:
 I've been spending some time thinking about this, and I think we can in
 fact get away without ever propagating command from domains to the host.

 I made a list of all commands that possible require host ITS command
 propagation. There are two groups:
 1: enabling/disabling LPIs: INV, INVALL
 2: mapping/unmapping devices/events: DISCARD, MAPD, MAPTI.

 The second group can be handled by mapping all required devices up
 front, I will elaborate on that in a different email.

 For the first group, read below ...

 On 01/12/16 01:19, Stefano Stabellini wrote:
> On Fri, 25 Nov 2016, Julien Grall wrote:
>> Hi,
>>
>> On 18/11/16 18:39, Stefano Stabellini wrote:
>>> On Fri, 11 Nov 2016, Stefano Stabellini wrote:
 On Fri, 11 Nov 2016, Julien Grall wrote:
> On 10/11/16 20:42, Stefano Stabellini wrote:
> That's why in the approach we had on the previous series was "host ITS
> command
> should be limited when emulating guest ITS command". From my recall, 
> in
> that
> series the host and guest LPIs was fully separated (enabling a guest
> LPIs was
> not enabling host LPIs).

 I am interested in reading what Ian suggested to do when the physical
 ITS queue is full, but I cannot find anything specific about it in the
 doc.

 Do you have a suggestion for this?

 The only things that come to mind right now are:

 1) check if the ITS queue is full and busy loop until it is not 
 (spin_lock
 style)
 2) check if the ITS queue is full and sleep until it is not (mutex 
 style)
>>>
>>> Another, probably better idea, is to map all pLPIs of a device when the
>>> device is assigned to a guest (including Dom0). This is what was written
>>> in Ian's design doc. The advantage of this approach is that Xen doesn't
>>> need to take any actions on the physical ITS command queue when the
>>> guest issues virtual ITS commands, therefore completely solving this
>>> problem at the root. (Although I am not sure about enable/disable
>>> commands: could we avoid issuing enable/disable on pLPIs?)
>>
>> In the previous design document (see [1]), the pLPIs are enabled when the
>> device is assigned to the guest. This means that it is not necessary to 
>> send
>> command there. This is also means we may receive a pLPI before the 
>> associated
>> vLPI has been configured.
>>
>> That said, given that LPIs are edge-triggered, there is no deactivate 
>> state
>> (see 4.1 in ARM IHI 0069C). So as soon as the priority drop is done, the 
>> same
>> LPIs could potentially be raised again. This could generate a storm.
>
> Thank you for raising this important point. You are correct.
>
>> The priority drop is necessary if we don't want to block the reception of
>> interrupt for the current physical CPU.
>>
>> What I am more concerned about is this problem can also happen in normal
>> running (i.e the pLPI is bound to an vLPI and the vLPI is enabled). For
>> edge-triggered interrupt, there is no way to prevent them to fire again. 
>> Maybe
>> it is time to introduce rate-limit interrupt for ARM. Any opinions?
>
> Yes. It could be as simple as disabling the pLPI when Xen receives a
> second pLPI before the guest EOIs the first corresponding vLPI, which
> shouldn't happen in normal circumstances.
>
> We need a simple per-LPI inflight counter, incremented when a pLPI is
> received, decremented when the corresponding vLPI is EOIed (the LR is
> cleared).
>
> When the counter > 1, we disable the pLPI and request a maintenance
> interrupt for the corresponding vLPI.

 So why do we need a _counter_? This is about edge triggered interrupts,
 I think we can just accumulate all of them into one.
>>>
>>> The counter is not to re-inject the same amount of interrupts into the
>>> guest, but to detect interrupt storms.
>>
>> I was wondering if an interrupt "storm" could already be defined by
>> "receiving an LPI while there is already one pending (in the guest's
>> virtual pending table) and it being disabled by the guest". I admit that
>> declaring two interrupts as a storm is a bit of a stretch, but in fact
>> the guest had probably a reason for disabling it even though it
>> fires, so Xen should just follow suit.
>> The only difference is that we don't do it _immediately_ when the guest
>> tells us (via INV), but only if needed (LPI actually fires).
> 
> Either way should work OK, I think.
> 
> 
  - If the VLPI is enabled, we EOI it on the host and inject it.
  - If the VLPI 

Re: [Xen-devel] [PATCH 2/3] Add Xen colo support for qemu-upstream colo codes

2016-12-12 Thread Wei Liu
On Thu, Dec 01, 2016 at 01:20:18PM +, Wei Liu wrote:
> On Wed, Nov 30, 2016 at 05:47:51PM +0800, Zhang Chen wrote:
> > Because of qemu codes update, we update Xen colo block codes
> > 
> > Signed-off-by: Zhang Chen 
> 
> COLO being an experimental feature means that you can change it at will,
> so for both patches:
> 
> Acked-by: Wei Liu 
> 

Now that COLO is fixed on HV side, I've pushed these two toolstack
patches.

I adjusted the commit subject and text a bit while committing.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/5] x86/shadow: Drop all emulation for PV vcpus

2016-12-12 Thread Andrew Cooper
Emulation is only performed for paging_mode_refcount() domains, which in
practice means HVM domains only.

Drop the PV emulation code.  As it always set addr_side and sp_size to
BITS_PER_LONG, it can't have worked correctly for PV guests running in a
different mode to Xen.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/shadow/common.c | 111 +++-
 xen/arch/x86/mm/shadow/multi.c  |  21 ++--
 2 files changed, 22 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 84a87f3..2525a57 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -318,75 +318,6 @@ static const struct x86_emulate_ops 
hvm_shadow_emulator_ops = {
 .cpuid  = hvmemul_cpuid,
 };
 
-static int
-pv_emulate_read(enum x86_segment seg,
-unsigned long offset,
-void *p_data,
-unsigned int bytes,
-struct x86_emulate_ctxt *ctxt)
-{
-unsigned int rc;
-
-if ( !is_x86_user_segment(seg) )
-return X86EMUL_UNHANDLEABLE;
-
-if ( (rc = copy_from_user(p_data, (void *)offset, bytes)) != 0 )
-{
-x86_emul_pagefault(0, offset + bytes - rc, ctxt); /* Read fault. */
-return X86EMUL_EXCEPTION;
-}
-
-return X86EMUL_OKAY;
-}
-
-static int
-pv_emulate_write(enum x86_segment seg,
- unsigned long offset,
- void *p_data,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
-struct sh_emulate_ctxt *sh_ctxt =
-container_of(ctxt, struct sh_emulate_ctxt, ctxt);
-struct vcpu *v = current;
-if ( !is_x86_user_segment(seg) )
-return X86EMUL_UNHANDLEABLE;
-return v->arch.paging.mode->shadow.x86_emulate_write(
-v, offset, p_data, bytes, sh_ctxt);
-}
-
-static int
-pv_emulate_cmpxchg(enum x86_segment seg,
-   unsigned long offset,
-   void *p_old,
-   void *p_new,
-   unsigned int bytes,
-   struct x86_emulate_ctxt *ctxt)
-{
-struct sh_emulate_ctxt *sh_ctxt =
-container_of(ctxt, struct sh_emulate_ctxt, ctxt);
-unsigned long old, new;
-struct vcpu *v = current;
-
-if ( !is_x86_user_segment(seg) || bytes > sizeof(long) )
-return X86EMUL_UNHANDLEABLE;
-
-old = new = 0;
-memcpy(&old, p_old, bytes);
-memcpy(&new, p_new, bytes);
-
-return v->arch.paging.mode->shadow.x86_emulate_cmpxchg(
-   v, offset, old, new, bytes, sh_ctxt);
-}
-
-static const struct x86_emulate_ops pv_shadow_emulator_ops = {
-.read   = pv_emulate_read,
-.insn_fetch = pv_emulate_read,
-.write  = pv_emulate_write,
-.cmpxchg= pv_emulate_cmpxchg,
-.cpuid  = pv_emul_cpuid,
-};
-
 const struct x86_emulate_ops *shadow_init_emulation(
 struct sh_emulate_ctxt *sh_ctxt, struct cpu_user_regs *regs)
 {
@@ -394,17 +325,13 @@ const struct x86_emulate_ops *shadow_init_emulation(
 struct vcpu *v = current;
 unsigned long addr;
 
+ASSERT(has_hvm_container_vcpu(v));
+
 memset(sh_ctxt, 0, sizeof(*sh_ctxt));
 
 sh_ctxt->ctxt.regs = regs;
 sh_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
 
-if ( is_pv_vcpu(v) )
-{
-sh_ctxt->ctxt.addr_size = sh_ctxt->ctxt.sp_size = BITS_PER_LONG;
-return &pv_shadow_emulator_ops;
-}
-
 /* Segment cache initialisation. Primed with CS. */
 creg = hvm_get_seg_reg(x86_seg_cs, sh_ctxt);
 
@@ -441,24 +368,24 @@ void shadow_continue_emulation(struct sh_emulate_ctxt 
*sh_ctxt,
 struct vcpu *v = current;
 unsigned long addr, diff;
 
-/* We don't refetch the segment bases, because we don't emulate
- * writes to segment registers */
+ASSERT(has_hvm_container_vcpu(v));
 
-if ( is_hvm_vcpu(v) )
-{
-diff = regs->eip - sh_ctxt->insn_buf_eip;
-if ( diff > sh_ctxt->insn_buf_bytes )
-{
-/* Prefetch more bytes. */
-sh_ctxt->insn_buf_bytes =
-(!hvm_translate_linear_addr(
-x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
-hvm_access_insn_fetch, sh_ctxt, &addr) &&
- !hvm_fetch_from_guest_linear(
- sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, 
NULL))
-? sizeof(sh_ctxt->insn_buf) : 0;
-sh_ctxt->insn_buf_eip = regs->eip;
-}
+/*
+ * We don't refetch the segment bases, because we don't emulate
+ * writes to segment registers
+ */
+diff = regs->eip - sh_ctxt->insn_buf_eip;
+if ( diff > sh_ctxt->insn_buf_bytes )
+{
+/* Prefetch more bytes. */
+sh_ctxt->insn_buf_bytes =
+(!hvm_translate_linear_addr(
+x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
+hvm_access_insn_fetch, sh_ctxt, 

[Xen-devel] [PATCH 0/5] x86: Remove vestigal PV Autotranslate pieces

2016-12-12 Thread Andrew Cooper
Autotranslate PV domains haven't been able to be built for two releases of
Xen, and noone has noticed.  The shadow emulation code for such domains has
never functioned correctly for guests running in a mode different to Xen.

This isn't as much cleanup as I intended to do, but it turns out that I pulled
a little too hard on a thread, and everything fell to pieces.  This reduced
series has had moderate testing within XenServer, and everything still appears
to be fine.

Toolstack and PVH folk: There is also toolstack side cleanup which can be
done, but the concept of a translated PV guest is also used for PVHv1.  I have
some extra deletion to contribute to whomever rips PVHv1 out of the domain
builder.

Andrew Cooper (5):
  x86/traps: Drop paging_mode_external() handling from the PV pagefault path
  x86/shadow: Tweak some initialisation in sh_page_fault()
  x86/paging: Enforce PG_external == PG_translate == PG_refcounts
  x86/shadow: Drop all emulation for PV vcpus
  x86/shadow: Misc minor cleanup

 xen/arch/x86/mm/paging.c|  19 ---
 xen/arch/x86/mm/shadow/common.c | 112 +++-
 xen/arch/x86/mm/shadow/multi.c  |  46 -
 xen/arch/x86/traps.c|  16 ++
 xen/include/asm-x86/paging.h|   9 +++-
 5 files changed, 55 insertions(+), 147 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/5] x86/shadow: Tweak some initialisation in sh_page_fault()

2016-12-12 Thread Andrew Cooper
sh_page_fault() is a complicated function.  It aids clarity for the reader if
constant data is declared as such.

Declare struct npfec access and fetch_type_t ft as const, which requires
initialising them during declaration.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/shadow/multi.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f494f7b..67c98b9 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2860,15 +2860,17 @@ static int sh_page_fault(struct vcpu *v,
 struct sh_emulate_ctxt emul_ctxt;
 const struct x86_emulate_ops *emul_ops;
 int r;
-fetch_type_t ft = 0;
 p2m_type_t p2mt;
 uint32_t rc;
 int version;
-struct npfec access = {
+const struct npfec access = {
  .read_access = 1,
+ .write_access = !!(regs->error_code & PFEC_write_access),
  .gla_valid = 1,
  .kind = npfec_kind_with_gla
 };
+const fetch_type_t ft =
+access.write_access ? ft_demand_write : ft_demand_read;
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
 int fast_emul = 0;
 #endif
@@ -2878,9 +2880,6 @@ static int sh_page_fault(struct vcpu *v,
 
 perfc_incr(shadow_fault);
 
-if ( regs->error_code & PFEC_write_access )
-access.write_access = 1;
-
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
 /* If faulting frame is successfully emulated in last shadow fault
  * it's highly likely to reach same emulation action for this frame.
@@ -3050,10 +3049,6 @@ static int sh_page_fault(struct vcpu *v,
 goto propagate;
 }
 
-/* What kind of access are we dealing with? */
-ft = ((regs->error_code & PFEC_write_access)
-  ? ft_demand_write : ft_demand_read);
-
 /* What mfn is the guest trying to access? */
 gfn = guest_l1e_get_gfn(gw.l1e);
 gmfn = get_gfn(d, gfn, &p2mt);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Andrew Cooper
PV guests necessarily can't be external, as Xen must steal address space from
them.  Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() and
don't enter the PV fixup_page_fault() path.

This paging_fault() callsite is therefore dead code, so drop it.

Clarify the comment at the other paging_fault() callsite to indicate that it
covers the logdirty case only.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/traps.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 013e633..b20faa4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1799,15 +1799,6 @@ static int fixup_page_fault(unsigned long addr, struct 
cpu_user_regs *regs)
 if ( in_irq() || !(regs->eflags & X86_EFLAGS_IF) )
 return 0;
 
-/* Faults from external-mode guests are handled by shadow/hap */
-if ( paging_mode_external(d) && guest_mode(regs) )
-{
-int ret = paging_fault(addr, regs);
-if ( ret == EXCRET_fault_fixed )
-trace_trap_two_addr(TRC_PV_PAGING_FIXUP, regs->eip, addr);
-return ret;
-}
-
 if ( !(regs->error_code & PFEC_page_present) &&
   (pagefault_by_memadd(addr, regs)) )
 return handle_memadd_fault(addr, regs);
@@ -1838,8 +1829,11 @@ static int fixup_page_fault(unsigned long addr, struct 
cpu_user_regs *regs)
 return EXCRET_fault_fixed;
 }
 
-/* For non-external shadowed guests, we fix up both their own 
- * pagefaults and Xen's, since they share the pagetables. */
+/*
+ * For non-external shadowed guests (i.e. PV guests with logdirty
+ * active), we fix up both their own pagefaults and Xen's, since
+ * they share the pagetables.
+ */
 if ( paging_mode_enabled(d) && !paging_mode_external(d) )
 {
 int ret = paging_fault(addr, regs);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/5] x86/paging: Enforce PG_external == PG_translate == PG_refcounts

2016-12-12 Thread Andrew Cooper
Setting PG_refcounts but not PG_translate is not useful.

While adjusting this, make a few other improvements.

 * Have paging_enable() unilaterally reject any unknown modes.
 * Correct the function description.  paging_enable() is also used to enable
   logdirty during runtime.
 * Drop the or'ing of PG_{HAP,SH}_enable.  The underlying functions already do
   this.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/paging.c| 19 +--
 xen/arch/x86/mm/shadow/common.c |  3 +--
 xen/include/asm-x86/paging.h|  9 +++--
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 853a035..9548222 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -842,23 +842,30 @@ void paging_final_teardown(struct domain *d)
 p2m_final_teardown(d);
 }
 
-/* Enable an arbitrary paging-assistance mode.  Call once at domain
- * creation. */
+/*
+ * Enable an arbitrary paging-assistance mode.  Call once at domain
+ * creation, and during runtime for logdirty mode.
+ */
 int paging_enable(struct domain *d, u32 mode)
 {
-switch ( mode & (PG_external | PG_translate) )
+/* Unrecognised paging mode? */
+if ( mode & ~PG_MASK )
+return -EINVAL;
+
+/* All of external|translate|refcounts, or none. */
+switch ( mode & (PG_external | PG_translate | PG_refcounts) )
 {
 case 0:
-case PG_external | PG_translate:
+case PG_external | PG_translate | PG_refcounts:
 break;
 default:
 return -EINVAL;
 }
 
 if ( hap_enabled(d) )
-return hap_enable(d, mode | PG_HAP_enable);
+return hap_enable(d, mode);
 else
-return shadow_enable(d, mode | PG_SH_enable);
+return shadow_enable(d, mode);
 }
 
 /* Called from the guest to indicate that a process is being torn down
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 39be564..84a87f3 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3144,8 +3144,7 @@ int shadow_enable(struct domain *d, u32 mode)
 domain_pause(d);
 
 /* Sanity check the arguments */
-if ( shadow_mode_enabled(d) ||
- ((mode & PG_translate) && !(mode & PG_refcounts)) )
+if ( shadow_mode_enabled(d) )
 {
 rv = -EINVAL;
 goto out_unlocked;
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index f83ed8b..1535659 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -57,6 +57,9 @@
  * requires VT or similar mechanisms */
 #define PG_external(XEN_DOMCTL_SHADOW_ENABLE_EXTERNAL << PG_mode_shift)
 
+/* All paging modes. */
+#define PG_MASK (PG_refcounts | PG_log_dirty | PG_translate | PG_external)
+
 #define paging_mode_enabled(_d)   (!!(_d)->arch.paging.mode)
 #define paging_mode_shadow(_d)(!!((_d)->arch.paging.mode & PG_SH_enable))
 #define paging_mode_hap(_d)   (!!((_d)->arch.paging.mode & PG_HAP_enable))
@@ -212,8 +215,10 @@ int paging_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void paging_final_teardown(struct domain *d);
 
-/* Enable an arbitrary paging-assistance mode.  Call once at domain
- * creation. */
+/*
+ * Enable an arbitrary paging-assistance mode.  Call once at domain
+ * creation, and during runtime for logdirty mode.
+ */
 int paging_enable(struct domain *d, u32 mode);
 
 #define paging_get_hostmode(v) ((v)->arch.paging.mode)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/5] x86/shadow: Misc minor cleanup

2016-12-12 Thread Andrew Cooper
 * Move the #ifdefary inside sh_audit_gw() to avoid needing the else clause.
 * The walk_t parameter is only ever read, so make it const.
 * Use mfn_eq() rather than opencoding it.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Tim Deegan 
CC: George Dunlap 
---
 xen/arch/x86/mm/shadow/multi.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 713f23d..336d24f 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -321,11 +321,11 @@ gw_remove_write_accesses(struct vcpu *v, unsigned long 
va, walk_t *gw)
 return rc;
 }
 
-#if SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES
 /* Lightweight audit: pass all the shadows associated with this guest walk
  * through the audit mechanisms */
-static void sh_audit_gw(struct vcpu *v, walk_t *gw)
+static void sh_audit_gw(struct vcpu *v, const walk_t *gw)
 {
+#if SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES
 struct domain *d = v->domain;
 mfn_t smfn;
 
@@ -362,13 +362,9 @@ static void sh_audit_gw(struct vcpu *v, walk_t *gw)
   && mfn_valid(
   (smfn = get_fl1_shadow_status(d, guest_l2e_get_gfn(gw->l2e )
 (void) sh_audit_fl1_table(v, smfn, INVALID_MFN);
+#endif /* SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES */
 }
 
-#else
-#define sh_audit_gw(_v, _gw) do {} while(0)
-#endif /* audit code */
-
-
 /*
  * Write a new value into the guest pagetable, and update the shadows
  * appropriately.  Returns 0 if we page-faulted, 1 for success.
@@ -3309,7 +3305,7 @@ static int sh_page_fault(struct vcpu *v,
 }
 }
 #else /* 32 or 64 */
-used = (mfn_x(pagetable_get_mfn(tmp->arch.guest_table)) == 
mfn_x(gmfn));
+used = mfn_eq(pagetable_get_mfn(tmp->arch.guest_table), gmfn);
 #endif
 if ( used )
 break;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 4/5] tools/libacpi: update FADT layout to support version 5

2016-12-12 Thread Roger Pau Monne
On Wed, Dec 07, 2016 at 09:04:14AM -0700, Jan Beulich wrote:
> >>> On 05.12.16 at 16:04,  wrote:
> > --- a/tools/firmware/hvmloader/util.c
> > +++ b/tools/firmware/hvmloader/util.c
> > @@ -949,7 +949,7 @@ void hvmloader_acpi_build_tables(struct acpi_config 
> > *config,
> >  config->table_flags |= ACPI_HAS_SSDT_S4;
> >  
> >  config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | 
> > ACPI_HAS_WAET |
> > -ACPI_HAS_VGA | ACPI_HAS_8042);
> > +ACPI_HAS_VGA | ACPI_HAS_8042 | ACPI_FADT_4);
> 
> It shouldn't be an explicit FADT version that gets requested, but
> a specific ACPI version (and I don't think a boolean flag is suitable
> for this).

Hm, I'm not opposed to this, but maybe then there are other tables to be bumped 
if we really want to request a specific ACPI version?
 
> > --- a/tools/libacpi/acpi2_0.h
> > +++ b/tools/libacpi/acpi2_0.h
> > @@ -169,7 +169,7 @@ struct acpi_10_fadt {
> >  /*
> >   * Fixed ACPI Description Table Structure (FADT).
> >   */
> > -struct acpi_20_fadt {
> > +struct acpi_50_fadt {
> 
> I think it would be better to drop the version number from this
> name and ...
> 
> > @@ -222,6 +222,8 @@ struct acpi_20_fadt {
> >  struct acpi_20_generic_address x_pm_tmr_blk;
> >  struct acpi_20_generic_address x_gpe0_blk;
> >  struct acpi_20_generic_address x_gpe1_blk;
> > +struct acpi_20_generic_address sleep_control;
> > +struct acpi_20_generic_address sleep_status;
> >  };
> 
> ... add version comments to the new fields.

Ack.

> > --- a/tools/libacpi/build.c
> > +++ b/tools/libacpi/build.c
> > @@ -503,12 +503,13 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct 
> > acpi_config *config)
> >  struct acpi_20_rsdp *rsdp;
> >  struct acpi_20_rsdt *rsdt;
> >  struct acpi_20_xsdt *xsdt;
> > -struct acpi_20_fadt *fadt;
> > +struct acpi_50_fadt *fadt;
> >  struct acpi_10_fadt *fadt_10;
> >  struct acpi_20_facs *facs;
> >  unsigned char   *dsdt;
> >  unsigned longsecondary_tables[ACPI_MAX_SECONDARY_TABLES];
> >  int  nr_secondaries, i;
> > +unsigned longfadt_size;
> 
> unsigned int would suffice here, wouldn't it? Otherwise it should be
> size_t.

Yes, unsigned int should be fine.

> > --- a/tools/libacpi/static_tables.c
> > +++ b/tools/libacpi/static_tables.c
> > @@ -38,11 +38,11 @@ struct acpi_20_facs Facs = {
> >  #define ACPI_PM_TMR_BLK_BIT_WIDTH   0x20
> >  #define ACPI_PM_TMR_BLK_BIT_OFFSET  0x00
> >  
> > -struct acpi_20_fadt Fadt = {
> > +struct acpi_50_fadt Fadt = {
> >  .header = {
> > -.signature= ACPI_2_0_FADT_SIGNATURE,
> > -.length   = sizeof(struct acpi_20_fadt),
> > -.revision = ACPI_2_0_FADT_REVISION,
> > +.signature= ACPI_FADT_SIGNATURE,
> > +.length   = sizeof(struct acpi_50_fadt),
> > +.revision = ACPI_5_0_FADT_REVISION,
> 
> Let's please not pre-initialize either value, but set both fields uniformly
> at runtime.

Ack.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] console: allow log level threshold adjustments

2016-12-12 Thread Andrew Cooper
On 12/12/16 10:16, Jan Beulich wrote:
> ... from serial console so that one doesn't always need to reboot to
> see more / less messages.
>
> Note that upper thresholds are sticky, i.e. while they get adjusted
> upwards when the lower threshold would otherwise end up above the upper
> one, they don't get adjusted when reducing the lower one. Full
> flexibility is available only via a future sysctl interface.
>
> Note further that (meaningless) large threshold values aren't being
> rejected, for the sake of not adding more checks to the code than are
> really necessary for safe operation.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 4/5] tools/libacpi: update FADT layout to support version 5

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 11:45,  wrote:
> On Wed, Dec 07, 2016 at 09:04:14AM -0700, Jan Beulich wrote:
>> >>> On 05.12.16 at 16:04,  wrote:
>> > --- a/tools/firmware/hvmloader/util.c
>> > +++ b/tools/firmware/hvmloader/util.c
>> > @@ -949,7 +949,7 @@ void hvmloader_acpi_build_tables(struct acpi_config 
>> > *config,
>> >  config->table_flags |= ACPI_HAS_SSDT_S4;
>> >  
>> >  config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | 
>> > ACPI_HAS_WAET |
>> > -ACPI_HAS_VGA | ACPI_HAS_8042);
>> > +ACPI_HAS_VGA | ACPI_HAS_8042 | ACPI_FADT_4);
>> 
>> It shouldn't be an explicit FADT version that gets requested, but
>> a specific ACPI version (and I don't think a boolean flag is suitable
>> for this).
> 
> Hm, I'm not opposed to this, but maybe then there are other tables to be 
> bumped 
> if we really want to request a specific ACPI version?

Indeed. But I can't reasonably expect you to do the entire switch
within this series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] X86/VPMU:clear the overflow status of which counter happened overflow

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 11:35,  wrote:
> If a counter happend overflow just clear corresponding bit of
> IA32_PERF_GLOBAL_OVF_CTRL, rather than clear all the bit of this msr.

The code change is fine, but the description appears to be wrong:
Isn't the change to avoid bits getting wrongly set, rather than any
getting wrongly cleared?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 11:43,  wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1799,15 +1799,6 @@ static int fixup_page_fault(unsigned long addr, struct 
> cpu_user_regs *regs)
>  if ( in_irq() || !(regs->eflags & X86_EFLAGS_IF) )
>  return 0;
>  
> -/* Faults from external-mode guests are handled by shadow/hap */
> -if ( paging_mode_external(d) && guest_mode(regs) )
> -{
> -int ret = paging_fault(addr, regs);
> -if ( ret == EXCRET_fault_fixed )
> -trace_trap_two_addr(TRC_PV_PAGING_FIXUP, regs->eip, addr);
> -return ret;
> -}

Perhaps worthwhile leaving an ASSERT() with the inverted if()
condition here?

Anyway, with or without that
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/7] tools/fuzz: introduce x86 instruction emulator target

2016-12-12 Thread Wei Liu
On Mon, Dec 12, 2016 at 02:58:39AM -0700, Jan Beulich wrote:
> >>> On 12.12.16 at 10:28,  wrote:
> > Instruction emulator fuzzing code is from code previous written by
> > Andrew and George. Adapted to llvm fuzzer and hook up the build system.
> > 
> > Signed-off-by: Andrew Cooper 
> > Signed-off-by: George Dunlap 
> > Signed-off-by: Wei Liu 
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Tim Deegan 
> > Cc: Wei Liu 
> > 
> > v3:
> > 1. coding style fix
> > 2. share more code
> > 3. exit when stack can't be made executable
> > ---
> >  .gitignore |   1 +
> >  tools/fuzz/x86_instruction_emulator/Makefile   |  31 
> >  .../x86-insn-emulator-fuzzer.c | 195 
> > +
> >  3 files changed, 227 insertions(+)
> >  create mode 100644 tools/fuzz/x86_instruction_emulator/Makefile
> >  create mode 100644 
> > tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
> > 
> > diff --git a/.gitignore b/.gitignore
> > index a2f34a1..d507243 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -145,6 +145,7 @@ tools/flask/utils/flask-loadpolicy
> >  tools/flask/utils/flask-setenforce
> >  tools/flask/utils/flask-set-bool
> >  tools/flask/utils/flask-label-pci
> > +tools/fuzz/x86_instruction_emulator/x86_emulate*
> >  tools/helpers/_paths.h
> >  tools/helpers/init-xenstore-domain
> >  tools/helpers/xen-init-dom0
> > diff --git a/tools/fuzz/x86_instruction_emulator/Makefile 
> > b/tools/fuzz/x86_instruction_emulator/Makefile
> > new file mode 100644
> > index 000..2b147ac
> > --- /dev/null
> > +++ b/tools/fuzz/x86_instruction_emulator/Makefile
> > @@ -0,0 +1,31 @@
> > +XEN_ROOT=$(CURDIR)/../../..
> > +include $(XEN_ROOT)/tools/Rules.mk
> > +
> > +x86-instruction-emulator-fuzzer-all: x86-insn-emulator.a 
> > x86-insn-emulator-fuzzer.o
> > +
> > +x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h:
> > +   [ -L x86_emulate ] || ln -sf $(XEN_ROOT)/xen/arch/x86/x86_emulate .
> > +
> > +x86_emulate.c x86_emulate.h: %:
> > +   [ -L $* ] || ln -sf $(XEN_ROOT)/tools/tests/x86_emulator/$*
> > +
> > +CFLAGS += $(CFLAGS_xeninclude)
> > +
> > +x86_emulate.o: x86_emulate.c x86_emulate.h x86_emulate/x86_emulate.c 
> > x86_emulate/x86_emulate.h
> 
> Perhaps worthwhile shortening this to
> 
> x86_emulate.o: x86_emulate.[ch] x86_emulate/x86_emulate.[ch]
> 
> ?

Done.

> 
> > --- /dev/null
> > +++ b/tools/fuzz/x86_instruction_emulator/x86-insn-emulator-fuzzer.c
> > @@ -0,0 +1,195 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "x86_emulate.h"
> > +
> > +static unsigned char data[4096];
> > +static unsigned int data_index = 0;
> 
> Pointless initializer.
> 

Done.

> > +static unsigned int data_max;
> > +
> > +static int data_read(const char *why, void *dst, unsigned int bytes)
> > +{
> > +unsigned i;
> 
> Please don't omit the "int" here (and in a few more places below)
> when basically everywhere else it is present.
> 

Done.

> > +if ( data_index + bytes > data_max )
> > +return X86EMUL_EXCEPTION;
> > +
> > +memcpy(dst,  data+data_index, bytes);
> 
> Blanks around binary operators please (more further down).
> 

Done.

> > +data_index += bytes;
> > +
> > +printf("%s: ", why);
> > +for ( i = 0; i < bytes; i++ )
> > +printf(" %02x", (unsigned int)*(unsigned char *)(dst+i));
> 
> Is the left most cast really needed here?
> 

No. I've deleted that.

> > +int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
> > +{
> > +bool stack_exec;
> > +struct cpu_user_regs regs = {};
> > +struct x86_emulate_ctxt ctxt =
> > +{
> > +.regs = ®s,
> > +.addr_size = 8 * sizeof(void *),
> > +.sp_size = 8 * sizeof(void *),
> > +};
> > +
> 
> Stray blank line. The indentation of the initializer above also looks
> a little unusual.
> 

Fixed.

> > +unsigned nr = 0;
> > +int rc;
> > +unsigned x;
> > +const uint8_t *p = data_p;
> > +
> > +stack_exec = emul_test_make_stack_executable();
> > +if ( !stack_exec )
> > +{
> > +printf("Warning: Stack could not be made executable (%d).\n", 
> > errno);
> > +exit(1);
> > +}
> > +
> > +/* Reset all global states */
> 
> DYM "state"?
> 

I mean "states". There are three states we need to reset.

> > +memset(data, 0, sizeof(data));
> > +data_index = 0;
> > +data_max = 0;
> > +
> > +nr = size < sizeof(regs) ? size : sizeof(regs);
> > +
> > +memcpy(®s, p, nr);
> > +p += sizeof(regs);
> > +nr += sizeof(regs);
> 
> I think this second += wants to be dropped, considering how nr
> gets set above and used below.
> 
> > +if ( nr <= size 

Re: [Xen-devel] [PATCH 2/5] x86/shadow: Tweak some initialisation in sh_page_fault()

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 11:43,  wrote:
> sh_page_fault() is a complicated function.  It aids clarity for the reader if
> constant data is declared as such.
> 
> Declare struct npfec access and fetch_type_t ft as const, which requires
> initialising them during declaration.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Tim Deegan
At 10:43 + on 12 Dec (1481539421), Andrew Cooper wrote:
> PV guests necessarily can't be external, as Xen must steal address space from
> them.  Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() and
> don't enter the PV fixup_page_fault() path.
> 
> This paging_fault() callsite is therefore dead code, so drop it.
> 
> Clarify the comment at the other paging_fault() callsite to indicate that it
> covers the logdirty case only.
> 
> No functional change.

IMO this is a change, just not on any supported config.

> -/* For non-external shadowed guests, we fix up both their own 
> - * pagefaults and Xen's, since they share the pagetables. */
> +/*
> + * For non-external shadowed guests (i.e. PV guests with logdirty
> + * active), we fix up both their own pagefaults and Xen's, since
> + * they share the pagetables.
> + */
>  if ( paging_mode_enabled(d) && !paging_mode_external(d) )

Here we can drop the check of !paging_mode_external(d), or maybe turn
it into an assertion somewhere.

With that,

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/7] tools/fuzz: introduce x86 instruction emulator target

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 12:19,  wrote:
> On Mon, Dec 12, 2016 at 02:58:39AM -0700, Jan Beulich wrote:
>> >>> On 12.12.16 at 10:28,  wrote:
>> > +/* Reset all global states */
>> 
>> DYM "state"?
> 
> I mean "states". There are three states we need to reset.

Hmm, to me as a non-native speaker it feels wrong to use plural
here (just like for e.g. milk), but maybe I'm wrong.

> From: Wei Liu 
> Date: Thu, 8 Dec 2016 12:09:54 +
> Subject: [PATCH] tools/fuzz: introduce x86 instruction emulator target
> 
> Instruction emulator fuzzing code is from code previous written by
> Andrew and George. Adapt it to llvm fuzzer and hook up the build system.
> 
> Signed-off-by: Andrew Cooper 
> Signed-off-by: George Dunlap 
> Signed-off-by: Wei Liu 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> 
> v4:
> 1. more coding style fixes and bug fixes
> 2. only do 64 bit build

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/5] x86/shadow: Tweak some initialisation in sh_page_fault()

2016-12-12 Thread Tim Deegan
At 10:43 + on 12 Dec (1481539422), Andrew Cooper wrote:
> sh_page_fault() is a complicated function.  It aids clarity for the reader if
> constant data is declared as such.
> 
> Declare struct npfec access and fetch_type_t ft as const, which requires
> initialising them during declaration.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] X86/VPMU:clear the overflow status of which counter happened overflow

2016-12-12 Thread Kang, Luwei
> >>> On 12.12.16 at 11:35,  wrote:
> > If a counter happend overflow just clear corresponding bit of
> > IA32_PERF_GLOBAL_OVF_CTRL, rather than clear all the bit of this msr.
> 
> The code change is fine, but the description appears to be wrong:
> Isn't the change to avoid bits getting wrongly set, rather than any getting 
> wrongly cleared?
> 
just set the corresponding bits of which counter happened overflow, rather than 
set all the available bits of IA32_PERF_GLOBAL_OVF_CTRL when happened pmu 
interrupt.
Or
Remove the redundant bits set of IA32_PERF_GLOBAL_OVF_CTRL.

Is that OK?

Thank,
Luwei Kang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/7] tools/fuzz: introduce x86 instruction emulator target

2016-12-12 Thread Wei Liu
On Mon, Dec 12, 2016 at 04:30:30AM -0700, Jan Beulich wrote:
> >>> On 12.12.16 at 12:19,  wrote:
> > On Mon, Dec 12, 2016 at 02:58:39AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.16 at 10:28,  wrote:
> >> > +/* Reset all global states */
> >> 
> >> DYM "state"?
> > 
> > I mean "states". There are three states we need to reset.
> 
> Hmm, to me as a non-native speaker it feels wrong to use plural
> here (just like for e.g. milk), but maybe I'm wrong.
> 

"State" is countable when used to represent the condition of
somebody/something.  Various other meanings of "state" don't apply here.

Sources:

https://www.oxfordlearnersdictionaries.com/definition/english/state_1
http://www.ldoceonline.com/dictionary/state

> > From: Wei Liu 
> > Date: Thu, 8 Dec 2016 12:09:54 +
> > Subject: [PATCH] tools/fuzz: introduce x86 instruction emulator target
> > 
> > Instruction emulator fuzzing code is from code previous written by
> > Andrew and George. Adapt it to llvm fuzzer and hook up the build system.
> > 
> > Signed-off-by: Andrew Cooper 
> > Signed-off-by: George Dunlap 
> > Signed-off-by: Wei Liu 
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Tim Deegan 
> > Cc: Wei Liu 
> > 
> > v4:
> > 1. more coding style fixes and bug fixes
> > 2. only do 64 bit build
> 
> Reviewed-by: Jan Beulich 
> 

Thanks.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/5] x86/paging: Enforce PG_external == PG_translate == PG_refcounts

2016-12-12 Thread Tim Deegan
At 10:43 + on 12 Dec (1481539423), Andrew Cooper wrote:
> Setting PG_refcounts but not PG_translate is not useful.
> 
> While adjusting this, make a few other improvements.
> 
>  * Have paging_enable() unilaterally reject any unknown modes.
>  * Correct the function description.  paging_enable() is also used to enable
>logdirty during runtime.

Don't run-time log-dirty changes go through paging_log_dirty_{en,dis}able()?

Apart from that, Acked-by: Tim Deegan .

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] arm/mem_access: adjust check_and_get_page to not rely on current

2016-12-12 Thread Julien Grall



On 12/12/16 07:47, Tamas K Lengyel wrote:



On Dec 12, 2016 00:42, "Jan Beulich" mailto:jbeul...@suse.com>> wrote:

>>> On 09.12.16 at 20:59, mailto:tamas.leng...@zentific.com>> wrote:
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
>   * we indeed found a conflicting mem_access setting.
>   */
>  static struct page_info*
> -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag)
> +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag,
> +  const struct vcpu *v)
>  {
>  long rc;
>  paddr_t ipa;
> @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t
gva, unsigned long flag)
>  xenmem_access_t xma;
>  p2m_type_t t;
>  struct page_info *page = NULL;
> -struct p2m_domain *p2m = ¤t->domain->arch.p2m;
> +struct p2m_domain *p2m = &v->domain->arch.p2m;
>
>  rc = gva_to_ipa(gva, &ipa, flag);
>  if ( rc < 0 )
> @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t
gva, unsigned long flag)
>   * We do this first as this is faster in the default case when no
>   * permission is set on the page.
>   */
> -rc = __p2m_get_mem_access(current->domain, gfn, &xma);
> +rc = __p2m_get_mem_access(v->domain, gfn, &xma);
>  if ( rc < 0 )
>  goto err;
>
> @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t
gva, unsigned long flag)
>
>  page = mfn_to_page(mfn_x(mfn));
>
> -if ( unlikely(!get_page(page, current->domain)) )
> +if ( unlikely(!get_page(page, v->domain)) )
>  page = NULL;
>
>  err:

Looking at these changes, wouldn't it be more reasonable to hand
the function a struct domain *?


In its current state it might be but I believe with altp2m we will need
the vcpu struct here anyway.


Not only the altp2m. I keep mentioning a bigger issue, but it seems been 
ignored every time...


The translation VA to IPA (guest physical address) is done using 
hardware. If the underlying memory of the stage-1 page table is 
protected, so the translation will fail. Given that this function is 
used in hypercall to retrieve the page associated to a buffer, it means 
that it will not be possible to do hypercall when the page table used to 
find the buffer IPA has not been touched.


I believe this would require the vCPU in hand to fix this problem.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Andrew Cooper
On 12/12/16 11:27, Tim Deegan wrote:
> At 10:43 + on 12 Dec (1481539421), Andrew Cooper wrote:
>> PV guests necessarily can't be external, as Xen must steal address space from
>> them.  Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() 
>> and
>> don't enter the PV fixup_page_fault() path.
>>
>> This paging_fault() callsite is therefore dead code, so drop it.
>>
>> Clarify the comment at the other paging_fault() callsite to indicate that it
>> covers the logdirty case only.
>>
>> No functional change.
> IMO this is a change, just not on any supported config.

Really?  I'd agree if the content of the clause was reachable (and we
didn't support the configuration required to make it reachable), but my
argument here is that it is genuinely dead code and cannot be reached.

The only way to make paging_mode_external(d) true would cause Xen to be
using a different path to service the pagefault.

This particular piece of code makes me wonder whether HVM guests used to
use the PV pagefault path, but if they ever did, they don't any more.

>
>> -/* For non-external shadowed guests, we fix up both their own 
>> - * pagefaults and Xen's, since they share the pagetables. */
>> +/*
>> + * For non-external shadowed guests (i.e. PV guests with logdirty
>> + * active), we fix up both their own pagefaults and Xen's, since
>> + * they share the pagetables.
>> + */
>>  if ( paging_mode_enabled(d) && !paging_mode_external(d) )
> Here we can drop the check of !paging_mode_external(d), or maybe turn
> it into an assertion somewhere.
>
> With that,
>
> Acked-by: Tim Deegan 

Seeing as both you and Jan have asked, I will see about addition an
assertion.  However, it will have to be in patch 3 for bisectability,
when we rule out the use of PV refcount guests.  Are you ok with that?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.5-testing test] 103161: regressions - FAIL

2016-12-12 Thread Ian Jackson
Rosstest service owner writes ("[xen-4.5-testing test] 103161: regressions - 
FAIL"):
> flight 103161 xen-4.5-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/103161/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-xtf-amd64-amd64-2   52 leak-check/check fail REGR. vs. 102721
>  test-xtf-amd64-amd64-4   52 leak-check/check fail REGR. vs. 102721
>  test-xtf-amd64-amd64-3   52 leak-check/check fail REGR. vs. 102721
>  test-xtf-amd64-amd64-5   52 leak-check/check fail REGR. vs. 102721
>  test-xtf-amd64-amd64-1   52 leak-check/check fail REGR. vs. 102721

AIUI this is a known and accepted regression in Xen 4.5: some of the
strange things done by one of the XTF tests causes a guest crash
(after a security updates was applied), and XTF does not clean up
after itself particularly well.

Accordingly I have force pushed this:

> version targeted for testing:
>  xen  27be856265352c259c597ff696766a1668cf7564

(I have not killed the in-progress flight so there will be one further
report looking a bit like this.)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/5] x86/shadow: Drop all emulation for PV vcpus

2016-12-12 Thread Tim Deegan
At 10:43 + on 12 Dec (1481539424), Andrew Cooper wrote:
> Emulation is only performed for paging_mode_refcount() domains, which in
> practice means HVM domains only.
> 
> Drop the PV emulation code.  As it always set addr_side and sp_size to
> BITS_PER_LONG, it can't have worked correctly for PV guests running in a
> different mode to Xen.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] tools: bump some library version numbers to 4.9

2016-12-12 Thread Ian Jackson
Wei Liu writes ("[PATCH] tools: bump some library version numbers to 4.9"):
> Bump the version number for libxc, lixlu, libxl and libvchan to 4.9.
> 
> Signed-off-by: Wei Liu 

Acked-by: Ian Jackson 

And pushed (since to check it was complete, I had to do the git-am
part anyway).

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] x86/shadow: Misc minor cleanup

2016-12-12 Thread Tim Deegan
At 10:43 + on 12 Dec (1481539425), Andrew Cooper wrote:
>  * Move the #ifdefary inside sh_audit_gw() to avoid needing the else clause.
>  * The walk_t parameter is only ever read, so make it const.
>  * Use mfn_eq() rather than opencoding it.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 103198: tolerable all pass - PUSHED

2016-12-12 Thread osstest service owner
flight 103198 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103198/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  e7dabe59c3239dc9ef9edbc49ed54f754616ebf7
baseline version:
 xen  f805e59c69263cdb0667e1f12fe3f5d8a26a2d3e

Last test of basis   103167  2016-12-11 20:46:37 Z0 days
Testing same since   103198  2016-12-12 09:13:19 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Julien Grall 
  Paul Durrant 
  Zhang Chen 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=e7dabe59c3239dc9ef9edbc49ed54f754616ebf7
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
e7dabe59c3239dc9ef9edbc49ed54f754616ebf7
+ branch=xen-unstable-smoke
+ revision=e7dabe59c3239dc9ef9edbc49ed54f754616ebf7
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' xe7dabe59c3239dc9ef9edbc49ed54f754616ebf7 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ : git://git

Re: [Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Tim Deegan
At 11:45 + on 12 Dec (1481543145), Andrew Cooper wrote:
> On 12/12/16 11:27, Tim Deegan wrote:
> > At 10:43 + on 12 Dec (1481539421), Andrew Cooper wrote:
> >> PV guests necessarily can't be external, as Xen must steal address space 
> >> from
> >> them.  Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() 
> >> and
> >> don't enter the PV fixup_page_fault() path.
> >>
> >> This paging_fault() callsite is therefore dead code, so drop it.
> >>
> >> Clarify the comment at the other paging_fault() callsite to indicate that 
> >> it
> >> covers the logdirty case only.
> >>
> >> No functional change.
> > IMO this is a change, just not on any supported config.
> 
> Really?  I'd agree if the content of the clause was reachable (and we
> didn't support the configuration required to make it reachable), but my
> argument here is that it is genuinely dead code and cannot be reached.

Ah, you're right - I was thinking of refcounts mode.  External mode
guests can't get here.  In which case...

> >> +/*
> >> + * For non-external shadowed guests (i.e. PV guests with logdirty
> >> + * active), we fix up both their own pagefaults and Xen's, since
> >> + * they share the pagetables.
> >> + */
> >>  if ( paging_mode_enabled(d) && !paging_mode_external(d) )
> > Here we can drop the check of !paging_mode_external(d), or maybe turn
> > it into an assertion somewhere.
> >
> > With that,
> >
> > Acked-by: Tim Deegan 
> 
> Seeing as both you and Jan have asked, I will see about addition an
> assertion.  However, it will have to be in patch 3 for bisectability,
> when we rule out the use of PV refcount guests.  Are you ok with that?

...I don't follow you here -- since external-mode guests can't reach this
code we can surely switch from test to assert.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] fix potential pa_range_info out of bound access

2016-12-12 Thread Julien Grall

Hi Stefano,

On 09/12/16 19:46, Stefano Stabellini wrote:

pa_range_info has only 8 elements and is accessed using pa_range as
index. pa_range is initialized to 16, potentially causing out of bound
access errors. Fix the issue by checking that pa_range is not greater
than the size of the array.

Coverity-ID: 1381865

Signed-off-by: Stefano Stabellini 

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e4991df..eb791db 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1639,7 +1639,8 @@ void __init setup_virt_paging(void)
 }

 /* pa_range is 4 bits, but the defined encodings are only 3 bits */
-if ( pa_range&0x8 || !pa_range_info[pa_range].pabits )
+if ( pa_range >= ARRAY_SIZE(pa_range_info) ||
+ pa_range&0x8 || !pa_range_info[pa_range].pabits )


I don't see any valid reason to check whether bit 3 is set (e.g the 
check "pa_range&0x8"). IHMO it was a bad way to check if pa_range was in 
the array index boundary.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/5] x86/paging: Enforce PG_external == PG_translate == PG_refcounts

2016-12-12 Thread Andrew Cooper
On 12/12/16 11:43, Tim Deegan wrote:
> At 10:43 + on 12 Dec (1481539423), Andrew Cooper wrote:
>> Setting PG_refcounts but not PG_translate is not useful.
>>
>> While adjusting this, make a few other improvements.
>>
>>  * Have paging_enable() unilaterally reject any unknown modes.
>>  * Correct the function description.  paging_enable() is also used to enable
>>logdirty during runtime.
> Don't run-time log-dirty changes go through paging_log_dirty_{en,dis}able()?

You are completely right.  Sorry for that.  I failed to observe that
logdirty gets pulled out before XEN_DOMCTL_SHADOW_OP_ENABLE is
propagated into shadow_domctl()

I will drop the comment adjustments.

>
> Apart from that, Acked-by: Tim Deegan .



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v2] Xen transport for 9pfs

2016-12-12 Thread Wei Liu
On Mon, Dec 05, 2016 at 05:33:23PM -0800, Stefano Stabellini wrote:
[...]
> ## Xenstore
> 
> The frontend and the backend connect via xenstore to exchange
> information. The toolstack creates front and back nodes with state
> [XenbusStateInitialising]. The protocol node name is **9pfs**.
> 
> Multiple rings are supported for each frontend and backend connection.
> 

It would help if you can state if a specific node is mandatory or
optional.

> ### Frontend XenBus Nodes
> ### Backend XenBus Nodes

[...]

> The producer always notifies the consumer after incrementing **prod**.
> However in some circumstances the producer is allowed not to notify the
> consumer, just as a performance improvement, and still maintain
> correctness. These are the steps to do it: after incrementing *prod*,
> the producer reads *cons* a second time; if the value is changed from
> the previous read and it is different from *prod* before the update,
> then the notification can be avoided. These are the producer steps, with
> the optimization:
> 
> - read *prod* (old_prod), *cons* (old_cons) from shared memory
> - general memory barrier
> - verify *prod* against local copy (consumer shouldn't change it)
> - write to array at position *prod* up to *cons*, wrapping around the circular
>   buffer when necessary
> - write memory barrier
> - increase *prod* (new_prod)
> - general memory barrier
> - read *cons* (new_cons)
> - if new_cons == old_cons or new_cons == old_prod, then notify the
>   consumer
> 

I think it would be valuable to extract this section to a generic "how
to write driver" doc. But that's probably something for another day.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v8 0/3] Have OpenStack tested on top of xen's master and libvirt's master.

2016-12-12 Thread Anthony PERARD
ping?

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] X86/VPMU:clear the overflow status of which counter happened overflow

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 12:40,  wrote:
>> >>> On 12.12.16 at 11:35,  wrote:
>> > If a counter happend overflow just clear corresponding bit of
>> > IA32_PERF_GLOBAL_OVF_CTRL, rather than clear all the bit of this msr.
>> 
>> The code change is fine, but the description appears to be wrong:
>> Isn't the change to avoid bits getting wrongly set, rather than any getting 
>> wrongly cleared?
>> 
> just set the corresponding bits of which counter happened overflow, rather 
> than set all the available bits of IA32_PERF_GLOBAL_OVF_CTRL when happened 
> pmu interrupt.
> Or
> Remove the redundant bits set of IA32_PERF_GLOBAL_OVF_CTRL.

The former seems better to me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [edk2] [PATCH RFC 10/14] UefiCpuPkg/BaseXApicX2ApicLib: Fix initialisation on my system and ...

2016-12-12 Thread Anthony PERARD
On Fri, Dec 09, 2016 at 06:48:47AM +, Kinney, Michael D wrote:
> Hi Anthony,
> 
> Can you provide more details on why you want to expose internal APIs
> in the library class?

I think I'm only using WriteLocalApicReg() to change the init counter.
Maybe I can just call InitializeApicTimer() again instead.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [edk2] [PATCH RFC 10/14] UefiCpuPkg/BaseXApicX2ApicLib: Fix initialisation on my system and ...

2016-12-12 Thread Anthony PERARD
On Fri, Dec 09, 2016 at 10:43:30AM +, Andrew Cooper wrote:
> On 09/12/16 06:48, Kinney, Michael D wrote:
> > Hi Anthony,
> >
> > Can you provide more details on why you want to expose internal APIs
> > in the library class?
> >
> > What is the specific issue?  Is the Local APIC in your environment 
> > not behaving the same as real HW?
> 
> Xen's LAPIC emulation should match real hardware (because we might even
> substitute it for real hardware APIC virtualisation support).
> 
> If Xen's LAPIC doesn't behave like real hardware, we should fix that,
> but first we should understand what the correct behaviour should be.

I'll try to find out what the real hardware is doing.

At least, I think there where nothing in the Intel manual about the
initial counter been reset.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/5] x86/paging: Enforce PG_external == PG_translate == PG_refcounts

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 12:59,  wrote:
> On 12/12/16 11:43, Tim Deegan wrote:
>> At 10:43 + on 12 Dec (1481539423), Andrew Cooper wrote:
>>> Setting PG_refcounts but not PG_translate is not useful.
>>>
>>> While adjusting this, make a few other improvements.
>>>
>>>  * Have paging_enable() unilaterally reject any unknown modes.
>>>  * Correct the function description.  paging_enable() is also used to enable
>>>logdirty during runtime.
>> Don't run-time log-dirty changes go through paging_log_dirty_{en,dis}able()?
> 
> You are completely right.  Sorry for that.  I failed to observe that
> logdirty gets pulled out before XEN_DOMCTL_SHADOW_OP_ENABLE is
> propagated into shadow_domctl()
> 
> I will drop the comment adjustments.

With that feel free to also add
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] x86/traps: Drop paging_mode_external() handling from the PV pagefault path

2016-12-12 Thread Andrew Cooper
On 12/12/16 11:57, Tim Deegan wrote:
> At 11:45 + on 12 Dec (1481543145), Andrew Cooper wrote:
>> On 12/12/16 11:27, Tim Deegan wrote:
>>> At 10:43 + on 12 Dec (1481539421), Andrew Cooper wrote:
 PV guests necessarily can't be external, as Xen must steal address space 
 from
 them.  Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() 
 and
 don't enter the PV fixup_page_fault() path.

 This paging_fault() callsite is therefore dead code, so drop it.

 Clarify the comment at the other paging_fault() callsite to indicate that 
 it
 covers the logdirty case only.

 No functional change.
>>> IMO this is a change, just not on any supported config.
>> Really?  I'd agree if the content of the clause was reachable (and we
>> didn't support the configuration required to make it reachable), but my
>> argument here is that it is genuinely dead code and cannot be reached.
> Ah, you're right - I was thinking of refcounts mode.  External mode
> guests can't get here.  In which case...
>
 +/*
 + * For non-external shadowed guests (i.e. PV guests with logdirty
 + * active), we fix up both their own pagefaults and Xen's, since
 + * they share the pagetables.
 + */
  if ( paging_mode_enabled(d) && !paging_mode_external(d) )
>>> Here we can drop the check of !paging_mode_external(d), or maybe turn
>>> it into an assertion somewhere.
>>>
>>> With that,
>>>
>>> Acked-by: Tim Deegan 
>> Seeing as both you and Jan have asked, I will see about addition an
>> assertion.  However, it will have to be in patch 3 for bisectability,
>> when we rule out the use of PV refcount guests.  Are you ok with that?
> ...I don't follow you here -- since external-mode guests can't reach this
> code we can surely switch from test to assert.

My plan was to add something like this:

if ( paging_mode_enabled(d) )
ASSERT(paging_mode_only_logdirty(d));

for a suitable new predicate.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 06/15] domctl: Add XEN_DOMCTL_acpi_access

2016-12-12 Thread Boris Ostrovsky



On 12/02/2016 02:48 AM, Jan Beulich wrote:

On 01.12.16 at 17:43,  wrote:

On 12/01/2016 11:06 AM, Jan Beulich wrote:



+++ b/xen/include/public/domctl.h
@@ -1144,6 +1144,29 @@ struct xen_domctl_psr_cat_op {
 typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);

+/* ACPI Generic Address Structure */
+typedef struct gas {

xen_acpi_gas


+#define XEN_ACPI_SYSTEM_MEMORY 0
+#define XEN_ACPI_SYSTEM_IO 1
+uint8_tspace_id;   /* Address space */
+uint8_tbit_width;  /* Size in bits of given register */
+uint8_tbit_offset; /* Bit offset within the register */
+uint8_taccess_width;   /* Minimum Access size (ACPI 3.0) */
+uint64_t   address;/* 64-bit address of register */

uint64_aligned_t with explicit padding added ahead of it.

And then there's the question of what uses of this will look like:
I'm not convinced we need to stick to the exact ACPI layout
here, unless you expect (or could imagine) for the tool stack to
hold GAS structures coming from elsewhere in its hands. If we
don't follow the layout as strictly, we could namely widen
bit_width (and maybe bit_offset) to allow for larger transfers
in one go. And in such a relaxed model I don't think we'd need
access_width at all as a field.


There is indeed no current need to use actual ACPI GAS layout but then
it's not GAS, really, and should be named something else.


Which of course is fine by me; I had referred to that structure only
for the underlying principle of specifying how to access the data.


Are there any registers that are not byte-aligned or not whole number of 
bytes?


I am thinking about dropping bit_offset (along with access_width) and 
making bit_width (byte_)width. And keeping the latter as uint8_t will 
also implicitly limit register size to 256 bytes which I think is a 
reasonable size limit.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] AMD VMMCALL and VM86 mode

2016-12-12 Thread Suravee Suthikulpanit

Hi Andrew/Boris,

On 12/12/16 01:37, Andrew Cooper wrote:

On 11/12/16 17:33, Boris Ostrovsky wrote:

- andrew.coop...@citrix.com wrote:


On 09/12/16 19:55, Andrew Cooper wrote:

On 09/12/16 19:55, Boris Ostrovsky wrote:

On 12/09/2016 02:01 PM, Andrew Cooper wrote:

Hello,

While working on XSA-192, I found a curious thing.  On AMD

hardware, the

VMMCALL instruction appears to behave like a nop if executed in

VM86

mode.  All other processor modes work fine.

The documentation suggests it should be valid in any situation,

but I

never get a #VMEXIT from it.

And I assume GENERAL2_INTERCEPT_VMMCALL is set (which is what we

have in

Xen by default)?

Yes, because I have already used hypercalls to get text to the

console

before entering vm86 mode.


What happens if you don't set it?

Let me do some hacking and see.

Outside of vm86 mode, VMMCALL raises #UD, which is expected as it
wasn't
intercepted.

From within vm86 mode, I now get #GP rather than #UD.

There is certainly an argument to be made that VMMCALL inside vm86
mode
should trap to the vm86 monitor and #GP would be the expected way of
that happening, but this also doesn't match the documentation.


Just curious: why do you think #GP could be a reasonable exception here?


In vm86 mode, #GP is raised for privileged instructions which should
break for auditing in the vm86 monitor.  It would be reasonable to
include "talking to the hypervisor" as a privileged action.


It's #UD because if not intercepted it doesn't make sense to execute it.


I agree with this, if privilege isn't considered an issue.  If a
hypervisor doesn't actually get involved, the instruction shouldn't
complete.


But either way, I think AMD should clarify this. Suravee, can you find out what 
the expected behavior is?


IMO, it should either consistently #GP and break to the vm86 monitor, or
#UD/#VMEXIT depending on whether it is intercepted by the hypervisor.
Either way the documentation should be clarified.

Having said this, given that its behaviour is consistent on any AMD
processor I choose to try, and given that vm86 mode is very legacy these
days, I doubt a reasonable argument can be made to fixing the behaviour.

~Andrew



I'm checking with the HW folks. Let me get back to you.

Suravee

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/8] public / x86: Introduce __HYPERCALL_dm_op...

2016-12-12 Thread Wei Liu
On Tue, Dec 06, 2016 at 01:46:12PM +, Paul Durrant wrote:
> ...as a set of hypercalls to be used by a device model.
[...]
> +Introduction
> +
> +
> +A previous proposal for a 'DMOP' was put forward by Ian Jackson on the 1st
> +of August. This proposal seem very promising, however a problem was
> +identified with it, that this proposal addresses.
> +
> +The aim of DMOP, as before, is to prevent a compromised device model from
> +compromising domains other then the one it is associated with. (And is
> +therefore likely already compromised).
> +
> +The previous proposal adds a DMOP hypercall, for use by the device models,
> +which places the domain ID in a fixed place, within the calling args,
> +such that the priv driver can always find it, and not need to know any
> +further details about the particular DMOP in order to validate it against
> +previously set (vie ioctl) domain.
> +
> +The problem occurs when you have a DMOP with references to other user memory
> +objects, such as with Track dirty VRAM (As used with the VGA buffer).
> +Is this case, the address of this other user object needs to be vetted,
> +to ensure it is not within a restricted address ranges, such as kernel
> +memory. The real problem comes down to how you would vet this address -
> +the idea place to do this is within the privcmd driver, since it would have
> +knowledge of the address space involved. However, with a principal goal
> +of DMOP to allow privcmd to be free from any knowledge of DMOP's sub ops,
> +it would have no way to identify any user buffer  addresses that need
> +checking.  The alternative of allowing the hypervisor to vet the address
> +is also problematic, since it has no knowledge of the guest memory layout.
> +

Could you perhaps rewrite this section? The reference to "Ian Jackson on
the ..." and some other bits aren't really useful in the context of a
design doc.

The comparison between the old and new proposal isn't needed IMHO. There
has never been an old implementation / design doc in xen.git.

> +Validation by privcmd driver
> +--
> +
> +If the privcmd driver has been restricted to specific domain (using a
> + new ioctl), when it received an op, it will:
> +
> +1. Check hypercall is DMOP.
> +
> +2. Check domid == restricted domid.
> +
> +3. For each @nr_buffers in @buffers: Check @h and @len give a buffer
> +   wholey in the user space part of the virtual address space. (e.g.,
> +   on Linux use access_ok()).
> +
> +
> +Xen Implementation
> +--
> +
> +Since a DMOP sub of may need to copy or return a buffer from the guest,

"sub of" -> "sub op" ?

> +as well as the DMOP itself to git the initial buffer, functions for doing
> +this would be written as below.  Note that care is taken to prevent
> +damage from buffer under or over run situations.  If the DMOP is called
> +with insufficient buffers, zeros will be read, while extra is ignored.
> +
[...]
>  # make_device_model(priv, dm_dom, hvm_dom)
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 2c83544..cc37752 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h

Is DMOP considered stable in this patch?  I guess not, given the header
is enclosed by __XEN__ and __XEN_TOOLS__.

Let's keep it in libxc for now. Eventually it will need to be moved to
tools/libs.

> @@ -41,6 +41,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
> index d57c39a..8e49635 100644
> --- a/tools/libxc/xc_private.c
> +++ b/tools/libxc/xc_private.c
> @@ -776,6 +776,76 @@ int xc_ffs64(uint64_t x)
>  return l ? xc_ffs32(l) : h ? xc_ffs32(h) + 32 : 0;
>  }
>  
> +int do_dm_op(xc_interface *xch, domid_t domid, unsigned int nr_bufs, ...)
> +{
> +int ret = -1;
> +struct  {
> +void *u;
> +void *h;
> +} *bounce;
> +DECLARE_HYPERCALL_BUFFER(xen_dm_op_buf_t, bufs);
> +va_list args;
> +unsigned int idx;
> +
> +bounce = calloc(nr_bufs, sizeof(*bounce));
> +if ( bounce == NULL )
> +goto fail1;
> +
> +bufs = xc_hypercall_buffer_alloc(xch, bufs, sizeof(*bufs) * nr_bufs);
> +if ( bufs == NULL )
> +goto fail2;
> +
> +va_start(args, nr_bufs);
> +for (idx = 0; idx < nr_bufs; idx++)

for ( idx .. idx++ )

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 06/15] domctl: Add XEN_DOMCTL_acpi_access

2016-12-12 Thread Julien Grall

Hi Boris,

On 29/11/16 15:33, Boris Ostrovsky wrote:

This domctl will allow toolstack to read and write some
ACPI registers. It will be available to both x86 and ARM
but will be implemented first only for x86


Can you explain why we would need this on ARM?

Cheers,



Signed-off-by: Boris Ostrovsky 
---
Changes in v4:
* New patch

 xen/arch/x86/domctl.c|  9 +
 xen/arch/x86/hvm/Makefile|  1 +
 xen/arch/x86/hvm/acpi.c  | 24 
 xen/include/asm-x86/hvm/domain.h |  3 +++
 xen/include/public/domctl.h  | 25 +
 5 files changed, 62 insertions(+)
 create mode 100644 xen/arch/x86/hvm/acpi.c

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 2a2fe04..111bcbb 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1430,6 +1430,15 @@ long arch_do_domctl(
 }
 break;

+case XEN_DOMCTL_acpi_access:
+if ( !is_hvm_domain(d) )
+ret = -EINVAL;
+else
+ret = hvm_acpi_domctl_access(d, domctl->u.acpi_access.rw,
+ &domctl->u.acpi_access.gas,
+ domctl->u.acpi_access.val);
+break;
+
 default:
 ret = iommu_do_domctl(domctl, d, u_domctl);
 break;
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index f750d13..bae3244 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -1,6 +1,7 @@
 subdir-y += svm
 subdir-y += vmx

+obj-y += acpi.o
 obj-y += asid.o
 obj-y += emulate.o
 obj-y += hpet.o
diff --git a/xen/arch/x86/hvm/acpi.c b/xen/arch/x86/hvm/acpi.c
new file mode 100644
index 000..7d42eaf
--- /dev/null
+++ b/xen/arch/x86/hvm/acpi.c
@@ -0,0 +1,24 @@
+/* acpi.c: ACPI access handling
+ *
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ */
+#include 
+#include 
+#include 
+
+
+int hvm_acpi_domctl_access(struct domain *d, uint8_t rw, gas_t *gas,
+   XEN_GUEST_HANDLE_PARAM(uint8) arg)
+{
+return -ENOSYS;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d55b432..c5cd86c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -158,6 +158,9 @@ struct hvm_domain {

 #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)

+int hvm_acpi_domctl_access(struct domain *currd, uint8_t rw, gas_t *gas,
+   XEN_GUEST_HANDLE_PARAM(uint8) arg);
+
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */

 /*
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 177319d..26fe009 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1144,6 +1144,29 @@ struct xen_domctl_psr_cat_op {
 typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);

+/* ACPI Generic Address Structure */
+typedef struct gas {
+#define XEN_ACPI_SYSTEM_MEMORY 0
+#define XEN_ACPI_SYSTEM_IO 1
+uint8_tspace_id;   /* Address space */
+uint8_tbit_width;  /* Size in bits of given register */
+uint8_tbit_offset; /* Bit offset within the register */
+uint8_taccess_width;   /* Minimum Access size (ACPI 3.0) */
+uint64_t   address;/* 64-bit address of register */
+} gas_t;
+
+struct xen_domctl_acpi_access {
+gas_t  gas;/* IN: Register being accessed */
+
+#define XEN_DOMCTL_ACPI_READ   0
+#define XEN_DOMCTL_ACPI_WRITE  1
+uint8_trw; /* IN: Read or write */
+
+XEN_GUEST_HANDLE_64(uint8) val;/* IN/OUT: data */
+};
+typedef struct xen_domctl_acpi_access xen_domctl_acpi_access_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_acpi_access_t);
+
 struct xen_domctl {
 uint32_t cmd;
 #define XEN_DOMCTL_createdomain   1
@@ -1221,6 +1244,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_monitor_op77
 #define XEN_DOMCTL_psr_cat_op78
 #define XEN_DOMCTL_soft_reset79
+#define XEN_DOMCTL_acpi_access   80
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_gdbsx_pausevcpu 1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu   1002
@@ -1283,6 +1307,7 @@ struct xen_domctl {
 struct xen_domctl_psr_cmt_oppsr_cmt_op;
 struct xen_domctl_monitor_opmonitor_op;
 struct xen_domctl_psr_cat_oppsr_cat_op;
+struct xen_domctl_acpi_access   acpi_access;
 uint8_t pad[128];
 } u;
 };



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/8] dm_op: convert HVMOP_*ioreq_server*

2016-12-12 Thread Wei Liu
On Tue, Dec 06, 2016 at 01:46:13PM +, Paul Durrant wrote:
> The definitions of HVM_IOREQSRV_BUFIOREQ_* have to persist as they are
> already in use by callers of the libxc interface.
> 
> Suggested-by: Jan Beulich 
> Signed-off-by: Paul Durrant 
> --
> Cc: Jan Beulich 
> Cc: Ian Jackson 
> Cc: Wei Liu 
> Cc: Andrew Cooper 
> Cc: Daniel De Graaf 
> 
> v2:
> - Addressed several comments from Jan.
> ---
>  tools/libxc/xc_domain.c  | 212 -

Toolstack changes in patch 2-8 look rather mechanical to me, so for all
7:

Acked-by: Wei Liu 

Provided HV maintainers are happy with with the dmop interface.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.5-testing test] 103161: regressions - FAIL

2016-12-12 Thread Andrew Cooper
On 12/12/16 11:47, Ian Jackson wrote:
> Rosstest service owner writes ("[xen-4.5-testing test] 103161: regressions - 
> FAIL"):
>> flight 103161 xen-4.5-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/103161/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-xtf-amd64-amd64-2   52 leak-check/check fail REGR. vs. 102721
>>  test-xtf-amd64-amd64-4   52 leak-check/check fail REGR. vs. 102721
>>  test-xtf-amd64-amd64-3   52 leak-check/check fail REGR. vs. 102721
>>  test-xtf-amd64-amd64-5   52 leak-check/check fail REGR. vs. 102721
>>  test-xtf-amd64-amd64-1   52 leak-check/check fail REGR. vs. 102721
> AIUI this is a known and accepted regression in Xen 4.5: some of the
> strange things done by one of the XTF tests causes a guest crash
> (after a security updates was applied), and XTF does not clean up
> after itself particularly well.

No.  These are not the issues we discussed.  The acceptable regression
is the invlpg~shadow test case only.


These leak check failures are because the XSA-195 PoC is somehow causing
qemu to segfault.  I cant explain why, because qemu should be involved
at all, and it is especially suspicious that it is only older branches
where this happens.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxl: QED disks support

2016-12-12 Thread Wei Liu
On Thu, Dec 08, 2016 at 02:52:50PM +0100, Cédric Bosdonnat wrote:
> Qdisk supports qcow and qcow2, extend it to also support qed disk
> format.
> 
> Signed-off-by: Cédric Bosdonnat 
> ---
>  v2:
>* Add qed to the list for possible format values in 
> xl-disk-configuration.txt
>* Add LIBXL_HAVE_QED
> 
> ---
>  docs/misc/xl-disk-configuration.txt |4 +-
>  tools/libxl/libxl.h |7 +
>  tools/libxl/libxl_device.c  |1 +
>  tools/libxl/libxl_dm.c  |1 +
>  tools/libxl/libxl_types.idl |1 +
>  tools/libxl/libxl_utils.c   |2 +
>  tools/libxl/libxlu_disk_l.c | 1018 
> ++-
>  tools/libxl/libxlu_disk_l.h |   53 +-
>  tools/libxl/libxlu_disk_l.l |3 +-
>  9 files changed, 548 insertions(+), 542 deletions(-)
> 
> diff --git a/docs/misc/xl-disk-configuration.txt 
> b/docs/misc/xl-disk-configuration.txt
> index b3402bc33a..310d2586c0 100644
> --- a/docs/misc/xl-disk-configuration.txt
> +++ b/docs/misc/xl-disk-configuration.txt
> @@ -87,7 +87,7 @@ format
>  --
>  
>  Description:   Specifies the format of image file.
> -Supported values:  raw, qcow, qcow2, vhd
> +Supported values:  raw, qcow, qcow2, vhd, qed
>  Deprecated values: None
>  Default value: raw
>  
> @@ -311,7 +311,7 @@ are found prepended to the format parameter - eg 
> "tap:aio:qcow:".
>  -
>  
>  Description:   Specifies the format (deprecated)
> -Supported values:  raw:  qcow2:  vhd:
> +Supported values:  raw:  qcow2:  vhd:  qed:

Sorry for not having noticed this earlier: we shouldn't extend the
deprecated syntax here.

I presume there is change to .l file to support this, please remove it.

Other than the issue mentioned above, this patch looks good to me.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [MULTIBOOT2 DOC PATCH v3 01/13] multiboot2: Replace u_phys with u32

2016-12-12 Thread Daniel Kiper
On Sat, Dec 10, 2016 at 08:23:15PM +0300, Andrei Borzenkov wrote:
> 07.12.2016 01:52, Daniel Kiper пишет:
> > u_phys is used just in two places and sometimes it may confuse reader.
> > Additionally, GRUB multiboot2 implementation does not use u_phys anywhere.
> > So, replace it with basic well defined and used in implementation u32 type.
> >
> > Signed-off-by: Daniel Kiper 
> > ---
> >  doc/multiboot.texi |   11 ---
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/doc/multiboot.texi b/doc/multiboot.texi
> > index 4b92918..2bda9b7 100644
> > --- a/doc/multiboot.texi
> > +++ b/doc/multiboot.texi
> > @@ -299,9 +299,6 @@ little-endian, u32 is coded in little-endian.
> >  The type of unsigned 64-bit data. Because the target architecture is
> >  little-endian, u64 is coded in little-endian.
> >
> > -@item u_phys
> > -The type of unsigned data of the same size as target architecture physical 
> > address size.
> > -
> >  @item u_virt
> >  The type of unsigned data of the same size as target architecture virtual 
> > address size.
> >
>
> So if I understand it correctly, any address used in multiboot2 is
> limited to 32 bit, so anything that is relevant to boot protocol must

More or less. There are some exceptions when EFI x64 platforms come on the 
stage.
It is described in spec.

> reside below 4G. Is my assumption correct?

Yep. More info you can find in patch #07.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 14/14] xen/x86: setup PVHv2 Dom0 ACPI tables

2016-12-12 Thread Jan Beulich
>>> On 30.11.16 at 17:49,  wrote:
> @@ -567,7 +573,7 @@ static __init void hvm_setup_e820(struct domain *d, 
> unsigned long nr_pages)
>  /*
>   * Craft the e820 memory map for Dom0 based on the hardware e820 map.
>   */
> -d->arch.e820 = xzalloc_array(struct e820entry, e820.nr_map);
> +d->arch.e820 = xzalloc_array(struct e820entry, E820MAX);

I have to admit that I consider this both wasteful and inflexible.
While commonly you'll need far less entries, what if in fact you
need more?

> +static int __init hvm_setup_acpi_madt(struct domain *d, paddr_t *addr)
> +{
> +struct acpi_table_madt *madt;
> +struct acpi_table_header *table;
> +struct acpi_madt_io_apic *io_apic;
> +struct acpi_madt_local_apic *local_apic;

I think I had asked about the absence of struct acpi_madt_local_x2apic
here before, but now that I look again I also wonder how you get away
without struct acpi_madt_nmi_source and struct acpi_madt_local_apic_nmi;
I can kind of see struct acpi_madt_local_apic_override not being
needed here (and I'd really hope anyway that no BIOS actually makes
use of it). The question mainly is what possible damage there may be
if what Dom0 gets to see disagrees with what the firmware acts on
(remember that AML code may alter MADT contents).

> +acpi_status status;
> +unsigned long size;
> +unsigned int i;
> +int rc;
> +
> +/* Count number of interrupt overrides in the MADT. */
> +acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
> +  acpi_count_intr_ovr, MAX_IRQ_SOURCES);

What's the reason for using MAX_IRQ_SOURCES here? There's no
static array involved here limiting the supported count.

> +/* Calculate the size of the crafted MADT. */
> +size = sizeof(*madt);
> +size += sizeof(*io_apic);
> +size += sizeof(*local_apic) * dom0_max_vcpus();
> +size += sizeof(struct acpi_madt_interrupt_override) * 
> acpi_intr_overrrides;

sizeof(*intsrcovr)

> +madt = xzalloc_bytes(size);
> +if ( !madt )
> +{
> +printk("Unable to allocate memory for MADT table\n");
> +return -ENOMEM;
> +}
> +
> +/* Copy the native MADT table header. */
> +status = acpi_get_table(ACPI_SIG_MADT, 0, &table);
> +if ( !ACPI_SUCCESS(status) )
> +{
> +printk("Failed to get MADT ACPI table, aborting.\n");
> +return -EINVAL;
> +}
> +*((struct acpi_table_header *)madt) = *table;

madt.header = *table;

> +madt->address = APIC_DEFAULT_PHYS_BASE;
> +/*
> + * NB: this is currently set to 4, which is the revision in the ACPI
> + * spec 6.1. Sadly ACPICA doesn't provide revision numbers for the
> + * tables described in the headers.
> + */
> +madt->header.revision = 4;

I can see you wanting to cap the revision here, but is it safe to
possibly bump it from what firmware has?

> +/*
> + * Setup the IO APIC entry.
> + * FIXME: the current vIO-APIC code just supports one IO-APIC instance
> + * per domain. This must be fixed in order to provide the same amount of
> + * IO APICs as available on bare metal.
> + */
> +if ( nr_ioapics > 1 )
> +printk("WARNING: found %d IO APICs, Dom0 will only have access to 1 
> emulated IO APIC\n",
> +   nr_ioapics);
> +io_apic = (struct acpi_madt_io_apic *)(madt + 1);
> +io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
> +io_apic->header.length = sizeof(*io_apic);
> +io_apic->id = 1;

Is it safe to use an ID other than what firmware did assign? Where
is this 1 coming from? And how does this get sync-ed up with the
respective struct hvm_hw_vioapic field?

> +io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
> +
> +local_apic = (struct acpi_madt_local_apic *)(io_apic + 1);
> +for ( i = 0; i < dom0_max_vcpus(); i++ )
> +{
> +local_apic->header.type = ACPI_MADT_TYPE_LOCAL_APIC;
> +local_apic->header.length = sizeof(*local_apic);
> +local_apic->processor_id = i;
> +local_apic->id = i * 2;
> +local_apic->lapic_flags = ACPI_MADT_ENABLED;
> +local_apic++;
> +}
> +
> +/* Setup interrupt overrides. */
> +intsrcovr = (struct acpi_madt_interrupt_override *)local_apic;
> +acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, 
> acpi_set_intr_ovr,
> +  MAX_IRQ_SOURCES);
> +ASSERT(((unsigned char *)intsrcovr - (unsigned char *)madt) == size);

Likely easier to read if you used void * or long for the casts here.

> +static bool __init hvm_acpi_table_allowed(const char *sig)
> +{
> +static const char __init banned_tables[][ACPI_NAME_SIZE] = {
> +ACPI_SIG_HPET, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_MPST,
> +ACPI_SIG_PMTT, ACPI_SIG_MADT, ACPI_SIG_DMAR};
> +unsigned long pfn, nr_pages;
> +int i;
> +
> +for ( i = 0 ; i < ARRAY_SIZE(banned_tables); i++ )
> +if ( strncmp(sig, banned_tables[i], ACPI_NAME_SIZE) == 0 )
> +return fa

[Xen-devel] [PATCH] xsm: allow relevant permission during migrate and gpu-passthrough.

2016-12-12 Thread Anshul Makkar
During guest migrate allow permission to prevent
spurious page faults.
Prevents these errors:
d73: Non-privileged (73) attempt to map I/O space 

avc: denied  { set_misc_info } for domid=0 target=11
scontext=system_u:system_r:dom0_t
tcontext=system_u:system_r:domU_t tclass=domain

GPU passthrough for hvm guest:
avc:  denied  { send_irq } for domid=0 target=10
scontext=system_u:system_r:dom0_t
tcontext=system_u:system_r:domU_t tclass=hvm

Signed-off-by: Anshul Makkar 
---

 tools/flask/policy/modules/xen.if |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index eb646f5..1aca75d 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -49,7 +49,7 @@ define(`create_domain_common', `
allow $1 $2:domain { create max_vcpus setdomainmaxmem setaddrsize
getdomaininfo hypercall setvcpucontext getscheduler
getvcpuinfo getaddrsize getaffinity setaffinity
-   settime setdomainhandle getvcpucontext };
+   settime setdomainhandle getvcpucontext set_misc_info };
allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
set_max_evtchn set_vnumainfo get_vnumainfo cacheflush
psr_cmt_op psr_cat_op soft_reset };
@@ -58,7 +58,7 @@ define(`create_domain_common', `
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
allow $1 $2:grant setup;
allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
-   setparam pcilevel trackdirtyvram nested altp2mhvm 
altp2mhvm_op };
+   setparam pcilevel trackdirtyvram nested altp2mhvm 
altp2mhvm_op send_irq };
 ')
 
 # create_domain(priv, target)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 06/15] domctl: Add XEN_DOMCTL_acpi_access

2016-12-12 Thread Jan Beulich
>>> On 12.12.16 at 14:08,  wrote:

> 
> On 12/02/2016 02:48 AM, Jan Beulich wrote:
> On 01.12.16 at 17:43,  wrote:
>>> On 12/01/2016 11:06 AM, Jan Beulich wrote:

> +++ b/xen/include/public/domctl.h
> @@ -1144,6 +1144,29 @@ struct xen_domctl_psr_cat_op {
>  typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);
>
> +/* ACPI Generic Address Structure */
> +typedef struct gas {
 xen_acpi_gas

> +#define XEN_ACPI_SYSTEM_MEMORY 0
> +#define XEN_ACPI_SYSTEM_IO 1
> +uint8_tspace_id;   /* Address space */
> +uint8_tbit_width;  /* Size in bits of given register */
> +uint8_tbit_offset; /* Bit offset within the register */
> +uint8_taccess_width;   /* Minimum Access size (ACPI 3.0) */
> +uint64_t   address;/* 64-bit address of register */
 uint64_aligned_t with explicit padding added ahead of it.

 And then there's the question of what uses of this will look like:
 I'm not convinced we need to stick to the exact ACPI layout
 here, unless you expect (or could imagine) for the tool stack to
 hold GAS structures coming from elsewhere in its hands. If we
 don't follow the layout as strictly, we could namely widen
 bit_width (and maybe bit_offset) to allow for larger transfers
 in one go. And in such a relaxed model I don't think we'd need
 access_width at all as a field.
>>>
>>> There is indeed no current need to use actual ACPI GAS layout but then
>>> it's not GAS, really, and should be named something else.
>>
>> Which of course is fine by me; I had referred to that structure only
>> for the underlying principle of specifying how to access the data.
> 
> Are there any registers that are not byte-aligned or not whole number of 
> bytes?
> 
> I am thinking about dropping bit_offset (along with access_width) and 
> making bit_width (byte_)width. And keeping the latter as uint8_t will 
> also implicitly limit register size to 256 bytes which I think is a 
> reasonable size limit.

Since we're doing the emulation (and hence defining the registers)
we could require no such unusual registers. This would be something
we can't simplify only if we foresee ever needing to hand through a
hardware register without interposing any emulation.

Whether limiting to 256 bytes is reasonable I'm not so sure, otoh.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] libxl: QED disks support

2016-12-12 Thread Cédric Bosdonnat
Qdisk supports qcow and qcow2, extend it to also support qed disk
format.

Signed-off-by: Cédric Bosdonnat 
---
 v2:
   * Add qed to the list for possible format values in xl-disk-configuration.txt
   * Add LIBXL_HAVE_QED

 v3:
   * Remove the qed: obsolete prefix support
---
 docs/misc/xl-disk-configuration.txt |2 +-
 tools/libxl/libxl.h |7 +
 tools/libxl/libxl_device.c  |1 +
 tools/libxl/libxl_dm.c  |1 +
 tools/libxl/libxl_types.idl |1 +
 tools/libxl/libxl_utils.c   |2 +
 tools/libxl/libxlu_disk_l.c | 1018 ++-
 tools/libxl/libxlu_disk_l.h |   53 +-
 tools/libxl/libxlu_disk_l.l |1 +
 9 files changed, 546 insertions(+), 540 deletions(-)

diff --git a/docs/misc/xl-disk-configuration.txt 
b/docs/misc/xl-disk-configuration.txt
index b3402bc33a..926889b23d 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -87,7 +87,7 @@ format
 --
 
 Description:   Specifies the format of image file.
-Supported values:  raw, qcow, qcow2, vhd
+Supported values:  raw, qcow, qcow2, vhd, qed
 Deprecated values: None
 Default value: raw
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index acbf47690e..3924464588 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1012,6 +1012,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, 
const libxl_mac *src);
  */
 #define LIBXL_HAVE_MEMKB_64BITS 1
 
+/*
+ * LIBXL_HAVE_QED
+ *
+ * If this is defined QED disk formats can be used for both HVM and PV guests.
+ */
+#define LIBXL_HAVE_QED 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 3e7a1026c4..6c34141072 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -411,6 +411,7 @@ char *libxl__device_disk_string_of_format(libxl_disk_format 
format)
 case LIBXL_DISK_FORMAT_VHD: return "vhd";
 case LIBXL_DISK_FORMAT_RAW:
 case LIBXL_DISK_FORMAT_EMPTY: return "aio";
+case LIBXL_DISK_FORMAT_QED: return "qed";
 default: return NULL;
 }
 }
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ad366a8cd3..8b373422f1 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -677,6 +677,7 @@ static const char 
*qemu_disk_format_string(libxl_disk_format format)
 case LIBXL_DISK_FORMAT_VHD: return "vpc";
 case LIBXL_DISK_FORMAT_RAW: return "raw";
 case LIBXL_DISK_FORMAT_EMPTY: return NULL;
+case LIBXL_DISK_FORMAT_QED: return "qed";
 default: return NULL;
 }
 }
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a32c751b0e..a612d1f4ff 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -114,6 +114,7 @@ libxl_disk_format = Enumeration("disk_format", [
 (3, "VHD"),
 (4, "RAW"),
 (5, "EMPTY"),
+(6, "QED"),
 ])
 
 libxl_disk_backend = Enumeration("disk_backend", [
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 49cbaa5b70..507ee56c7c 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -317,6 +317,8 @@ int libxl_string_to_backend(libxl_ctx *ctx, char *s, 
libxl_disk_backend *backend
 *backend = LIBXL_DISK_BACKEND_QDISK;
 } else if (!strcmp(p, "qcow2")) {
 *backend = LIBXL_DISK_BACKEND_QDISK;
+} else if (!strcmp(p, "qed")) {
+*backend = LIBXL_DISK_BACKEND_QDISK;
 }
 }
 out:
diff --git a/tools/libxl/libxlu_disk_l.c b/tools/libxl/libxlu_disk_l.c
index 54160caa66..fa09a69303 100644
--- a/tools/libxl/libxlu_disk_l.c
+++ b/tools/libxl/libxlu_disk_l.c
@@ -1,10 +1,7 @@
 #line 2 "libxlu_disk_l.c"
-#line 31 "libxlu_disk_l.l"
 #include "libxl_osdeps.h" /* must come before any other headers */
 
-
-
-#line 8 "libxlu_disk_l.c"
+#line 5 "libxlu_disk_l.c"
 
 #define  YY_INT_ALIGNED short int
 
@@ -12,8 +9,8 @@
 
 #define FLEX_SCANNER
 #define YY_FLEX_MAJOR_VERSION 2
-#define YY_FLEX_MINOR_VERSION 5
-#define YY_FLEX_SUBMINOR_VERSION 39
+#define YY_FLEX_MINOR_VERSION 6
+#define YY_FLEX_SUBMINOR_VERSION 1
 #if YY_FLEX_SUBMINOR_VERSION > 0
 #define FLEX_BETA
 #endif
@@ -92,25 +89,13 @@ typedef unsigned int flex_uint32_t;
 
 #endif /* ! FLEXINT_H */
 
-#ifdef __cplusplus
-
-/* The "const" storage-class-modifier is valid. */
-#define YY_USE_CONST
-
-#else  /* ! __cplusplus */
-
-/* C99 requires __STDC__ to be defined as 1. */
-#if defined (__STDC__)
-
-#define YY_USE_CONST
-
-#endif /* defined (__STDC__) */
-#endif /* ! __cplusplus */
-
-#ifdef YY_USE_CONST
+/* TODO: this is always defined, so inline it */
 #define yyconst const
+
+#if defined(__GNUC__) && __GNUC__ >= 3
+#define yynoreturn __attribute__((__noreturn__))
 #else
-#define yyconst
+#define yynoreturn
 #endif
 
 /* Returned

Re: [Xen-devel] [PATCH] xen/balloon: Only mark a page as managed when it is released

2016-12-12 Thread Juergen Gross
On 09/12/16 18:10, Ross Lagerwall wrote:
> Only mark a page as managed when it is released back to the allocator.
> This ensures that the managed page count does not get falsely increased
> when a VM is running. Correspondingly change it so that pages are
> marked as unmanaged after getting them from the allocator.
> 
> Signed-off-by: Ross Lagerwall 

Committed to xen/tip.git for-linus-4.10


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv6] xenbus: fix deadlock on writes to /proc/xen/xenbus

2016-12-12 Thread Juergen Gross
On 09/12/16 15:41, David Vrabel wrote:
> /proc/xen/xenbus does not work correctly.  A read blocked waiting for
> a xenstore message holds the mutex needed for atomic file position
> updates.  This blocks any writes on the same file handle, which can
> deadlock if the write is needed to unblock the read.
> 
> Clear FMODE_ATOMIC_POS when opening this device to always get
> character device like sematics.
> 
> Signed-off-by: David Vrabel 

Committed to xen/tip.git for-linus-4.10


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   >