Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas

On 11/27/2014 01:27 AM, Jan Beulich wrote:

This was precisely the reason why I told you that the numbering
differs (and is confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is
what above is listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in
line with the numbering the CPU documentation uses, it's rather
kind of meant to refer to the ACPI numbering (but probably also
not fully matching up).


Ah, thanks for the explanation.


So max_cstate=2 working suggests a problem with what the CPU
calls C6, which presumably isn't all that surprising considering the
many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not
sure how to proceed from here - I suppose you already made
sure you run with the latest available BIOS.


Yes, latest available BIOS.


And with 6 errata
documented it's not all that unlikely that there's a 7th one with
MONITOR/MWAIT behavior. The commit you bisected to (and
which you had verified to be the culprit by just forcing
arch_skip_send_event_check() to always return false) could be
reasonably assumed to be broken only when MWAIT use for all
C states didn't work.


Now I did get a hang with max_cstate=3 and mwait-idle=0. May I assume 
that mwait-idle=0 means that ACPI is responsible for the throttling?


Thanks again for all your help!

Steve

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 00:28 +, M A Young wrote:
> Migrations with xl migrate --debug will fail because debugging information 
> from the receiving process is written to the stdout channel. This channel 
> is also used for status messages so the migration will fail as the sending 
> process receives an unexpected message. This patch moves the debugging 
> information to the stderr channel.
> 
> Version 2 moves some output back to stdout that was accidentally moved 
> to stderr in the first version.
> 
> Signed-off-by: Michael Young 

I think you've forgotten the attachment.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-11-28 Thread Yu Zhang
XenGT (Intel Graphics Virtualization technology, please refer to
https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
xengt) driver runs inside Dom0 as a virtual graphics device model,
and needs to trap and emulate the guest's write operations to some
specific memory pages, like memory pages used by guest graphics
driver as PPGTT(per-process graphics translation table). We agreed
to add a new p2m type "p2m_mmio_write_dm" to trap the write operation
on these graphic page tables.

Handling of this new p2m type are similar with existing p2m_ram_ro
in most condition checks, with only difference on final policy of
emulation vs. drop. For p2m_mmio_write_dm type pages, writes will
go to the device model via ioreq-server.

Previously, the conclusion in our v3 patch review is to provide a
more generalized HVMOP_map_io_range_to_ioreq_server hypercall, by
seperating rangesets inside a ioreq server to read-protected/write-
protected/both-prtected. Yet, after offline discussion with Paul,
we believe a more simplified solution may suffice. We can keep the
existing HVMOP_map_io_range_to_ioreq_server hypercall, and let the
user decide whether or not a p2m type change is necessary, because
in most cases the emulator will already use the p2m_mmio_dm type.

Changes from v3:
 - Use the existing HVMOP_map_io_range_to_ioreq_server hypercall
   to add write protected range.
 - Modify the HVMOP_set_mem_type hypercall to support the new p2m
   type for this range

Changes from v2:
 - Remove excute attribute of the new p2m type p2m_mmio_write_dm
 - Use existing rangeset for keeping the write protection page range
   instead of introducing hash table.
 - Some code style fix.

Changes from v1:
 - Changes the new p2m type name from p2m_ram_wp to p2m_mmio_write_dm.
   This means that we treat the pages as a special mmio range instead
   of ram.
 - Move macros to c file since only this file is using them.
 - Address various comments from Jan.

Signed-off-by: Yu Zhang 
Signed-off-by: Wei Ye 
---
 xen/arch/x86/hvm/hvm.c  | 13 ++---
 xen/arch/x86/mm/p2m-ept.c   |  1 +
 xen/arch/x86/mm/p2m-pt.c|  1 +
 xen/include/asm-x86/p2m.h   |  1 +
 xen/include/public/hvm/hvm_op.h |  1 +
 5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8f49b44..5f806e8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
  * to the mmio handler.
  */
 if ( (p2mt == p2m_mmio_dm) || 
- (npfec.write_access && (p2mt == p2m_ram_ro)) )
+ (npfec.write_access &&
+   ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) )
 {
 put_gfn(p2m->domain, gfn);
 
@@ -5922,6 +5923,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 a.mem_type =  HVMMEM_ram_rw;
 else if ( p2m_is_grant(t) )
 a.mem_type =  HVMMEM_ram_rw;
+else if ( t == p2m_mmio_write_dm )
+a.mem_type = HVMMEM_mmio_write_dm;
 else
 a.mem_type =  HVMMEM_mmio_dm;
 rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
@@ -5941,7 +5944,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 static const p2m_type_t memtype[] = {
 [HVMMEM_ram_rw]  = p2m_ram_rw,
 [HVMMEM_ram_ro]  = p2m_ram_ro,
-[HVMMEM_mmio_dm] = p2m_mmio_dm
+[HVMMEM_mmio_dm] = p2m_mmio_dm,
+[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm
 };
 
 if ( copy_from_guest(&a, arg, 1) )
@@ -5987,14 +5991,17 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 rc = -EAGAIN;
 goto param_fail4;
 }
+
 if ( !p2m_is_ram(t) &&
- (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
+ (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
+ t != p2m_mmio_write_dm )
 {
 put_gfn(d, pfn);
 goto param_fail4;
 }
 
 rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]);
+
 put_gfn(d, pfn);
 if ( rc )
 goto param_fail4;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c6e83..e21a92d 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, 
p2m_type_t type, p2m_acces
 entry->x = 0;
 break;
 case p2m_grant_map_ro:
+case p2m_mmio_write_dm:
 entry->r = 1;
 entry->w = entry->x = 0;
 break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index e48b63a..26fb18d 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -94,6 +94,7 @@ static unsigned long p2m_type_t

Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug

2014-11-28 Thread M A Young

On Fri, 28 Nov 2014, Ian Campbell wrote:


On Fri, 2014-11-28 at 00:28 +, M A Young wrote:

Migrations with xl migrate --debug will fail because debugging information
from the receiving process is written to the stdout channel. This channel
is also used for status messages so the migration will fail as the sending
process receives an unexpected message. This patch moves the debugging
information to the stderr channel.

Version 2 moves some output back to stdout that was accidentally moved
to stderr in the first version.

Signed-off-by: Michael Young 


I think you've forgotten the attachment.


Yes, I did. Here it is.Use stderr for debugging messages for xl migrate --debug as stdout is used
for status messages.

--- xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c.orig 2014-10-24 15:22:40.0 
+0100
+++ xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c  2014-11-26 22:41:41.697043321 
+
@@ -380,10 +380,10 @@
 }
 static void printf_info(enum output_format output_format,
 int domid,
-libxl_domain_config *d_config)
+libxl_domain_config *d_config, FILE *fh)
 {
 if (output_format == OUTPUT_FORMAT_SXP)
-return printf_info_sexp(domid, d_config);
+return printf_info_sexp(domid, d_config, fh);
 
 const char *buf;
 libxl_yajl_length len = 0;
@@ -404,7 +404,7 @@
 if (s != yajl_gen_status_ok)
 goto out;
 
-puts(buf);
+fputs(buf, fh);
 
 out:
 yajl_gen_free(hand);
@@ -413,7 +413,13 @@
 fprintf(stderr,
 "unable to format domain config as JSON (YAJL:%d)\n", s);
 
-if (ferror(stdout) || fflush(stdout)) { perror("stdout"); exit(-1); }
+if (ferror(fh) || fflush(fh)) {
+if (fh == stdout)
+perror("stdout");
+else
+perror("stderr");
+exit(-1);
+}
 }
 
 static int do_daemonize(char *name)
@@ -2423,7 +2429,7 @@
 }
 
 if (!dom_info->quiet)
-printf("Parsing config from %s\n", config_source);
+fprintf(stderr, "Parsing config from %s\n", config_source);
 
 if (config_in_json) {
 libxl_domain_config_from_json(ctx, &d_config,
@@ -2451,7 +2457,7 @@
 }
 
 if (debug || dom_info->dryrun)
-printf_info(default_output_format, -1, &d_config);
+printf_info(default_output_format, -1, &d_config, stderr);
 
 ret = 0;
 if (dom_info->dryrun)
@@ -3403,7 +3409,7 @@
 if (default_output_format == OUTPUT_FORMAT_JSON)
 s = printf_info_one_json(hand, info[i].domid, &d_config);
 else
-printf_info_sexp(info[i].domid, &d_config);
+printf_info_sexp(info[i].domid, &d_config, stdout);
 libxl_domain_config_dispose(&d_config);
 if (s != yajl_gen_status_ok)
 goto out;
@@ -4725,7 +4731,7 @@
 parse_config_data(filename, config_data, config_len, &d_config);
 
 if (debug || dryrun_only)
-printf_info(default_output_format, -1, &d_config);
+printf_info(default_output_format, -1, &d_config, stdout);
 
 if (!dryrun_only) {
 fprintf(stderr, "setting dom%d configuration\n", domid);
--- xen-4.5.0-rc1/tools/libxl/xl_sxp.c.orig 2014-10-24 15:22:40.0 
+0100
+++ xen-4.5.0-rc1/tools/libxl/xl_sxp.c  2014-11-26 22:30:58.416394082 +
@@ -30,7 +30,7 @@
 /* In general you should not add new output to this function since it
  * is intended only for legacy use.
  */
-void printf_info_sexp(int domid, libxl_domain_config *d_config)
+void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh)
 {
 int i;
 libxl_dominfo info;
@@ -38,195 +38,195 @@
 libxl_domain_create_info *c_info = &d_config->c_info;
 libxl_domain_build_info *b_info = &d_config->b_info;
 
-printf("(domain\n\t(domid %d)\n", domid);
-printf("\t(create_info)\n");
-printf("\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM);
-printf("\t(hap %s)\n", libxl_defbool_to_string(c_info->hap));
-printf("\t(oos %s)\n", libxl_defbool_to_string(c_info->oos));
-printf("\t(ssidref %d)\n", c_info->ssidref);
-printf("\t(name %s)\n", c_info->name);
+fprintf(fh, "(domain\n\t(domid %d)\n", domid);
+fprintf(fh, "\t(create_info)\n");
+fprintf(fh, "\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM);
+fprintf(fh, "\t(hap %s)\n", libxl_defbool_to_string(c_info->hap));
+fprintf(fh, "\t(oos %s)\n", libxl_defbool_to_string(c_info->oos));
+fprintf(fh, "\t(ssidref %d)\n", c_info->ssidref);
+fprintf(fh, "\t(name %s)\n", c_info->name);
 
 /* retrieve the UUID from dominfo, since it is probably generated
  * during parsing and thus does not match the real one
  */
 if (libxl_domain_info(ctx, &info, domid) == 0) {
-printf("\t(uuid " LIBXL_UUID_FMT ")\n", LIBXL_UUID_BYTES(info.uuid));
+fprintf(fh, "\t(uuid " LIBXL_UUID_FMT ")\n", 
LIBXL_UUID_BYTES(info.uuid));
 } else {
-printf("\t(uuid )\n");
+fprintf(fh, "\t(uuid )\n");
 

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 09:24,  wrote:
>> And with 6 errata
>> documented it's not all that unlikely that there's a 7th one with
>> MONITOR/MWAIT behavior. The commit you bisected to (and
>> which you had verified to be the culprit by just forcing
>> arch_skip_send_event_check() to always return false) could be
>> reasonably assumed to be broken only when MWAIT use for all
>> C states didn't work.
> 
> Now I did get a hang with max_cstate=3 and mwait-idle=0.

According to the data you provided earlier, max_cstate=3 is
identical to not using that option at all when you also use
mwait-idle=0. It would make a difference only when not using
that latter option (and I specifically pointed this out in earlier
replies).

> May I assume 
> that mwait-idle=0 means that ACPI is responsible for the throttling?

ACPI is providing the data to do C state management in that case,
but it's still Xen doing things.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass

2014-11-28 Thread xen . org
flight 31882 xen-4.4-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31882/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 31781

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-sedf-pin  3 host-install(3) broken REGR. vs. 31781

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  a39f202031d7f1d8d9e14b8c3d7d11c812db253e
baseline version:
 xen  7679aeb444ed3bc4de0f473c16c47eab7d2f9d33


People who touched revisions under test:
  Jan Beulich 


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64   

Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 04:28,  wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4287,7 +4287,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
> unsigned int *ebx,
>   !host_tsc_is_safe() )
>  *edx &= ~cpufeat_mask(X86_FEATURE_RDTSCP);
>  /* Hide 1GB-superpage feature if we can't emulate it. */
> -if (!hvm_pse1gb_supported(d))
> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
>  *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);

With

#define hvm_pse1gb_supported(d) \
(cpu_has_page1gb && paging_mode_hap(d))

the change above is pointless. While, considering this, comments on
v2 may have been misleading, you should have simply updated the
patch description instead to clarify why the v2 change was okay
even for the shadow mode case.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 10:07,  wrote:
> flight 31882 xen-4.4-testing real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/31882/ 
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 
> 31781

Wasn't the swiotlb problem supposed to be dealt with by now?
swiotlb=65536 looks to be in place here, yet that still appears to
not be big enough...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas

On Nov 28, 2014, at 00:50, Jan Beulich  wrote:

 On 28.11.14 at 09:24,  wrote:
>>> And with 6 errata
>>> documented it's not all that unlikely that there's a 7th one with
>>> MONITOR/MWAIT behavior. The commit you bisected to (and
>>> which you had verified to be the culprit by just forcing
>>> arch_skip_send_event_check() to always return false) could be
>>> reasonably assumed to be broken only when MWAIT use for all
>>> C states didn't work.
>> 
>> Now I did get a hang with max_cstate=3 and mwait-idle=0.
> 
> According to the data you provided earlier, max_cstate=3 is
> identical to not using that option at all when you also use
> mwait-idle=0. It would make a difference only when not using
> that latter option (and I specifically pointed this out in earlier
> replies).
> 

Apologies for asking you to repeat yourself. Most of this stuff is over my head 
-- the only time I was this far down the rabbit hole was on an 8051.  Thanks 
for your patience. :-)

Steve


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 08:59,  wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
> long gla,
>   * to the mmio handler.
>   */
>  if ( (p2mt == p2m_mmio_dm) || 
> - (npfec.write_access && (p2mt == p2m_ram_ro)) )
> + (npfec.write_access &&
> + ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) )

Why are you corrupting indentation here?

Furthermore the code you modify here suggests that p2m_ram_ro
already has the needed semantics - writes get passed to the DM.
None of the other changes you make, and none of the other uses
of p2m_ram_ro appear to be in conflict with your intentions, so
you'd really need to explain better why you need the new type.

> @@ -5922,6 +5923,8 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  a.mem_type =  HVMMEM_ram_rw;
>  else if ( p2m_is_grant(t) )
>  a.mem_type =  HVMMEM_ram_rw;
> +else if ( t == p2m_mmio_write_dm )
> +a.mem_type = HVMMEM_mmio_write_dm;
>  else
>  a.mem_type =  HVMMEM_mmio_dm;
>  rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> @@ -5941,7 +5944,8 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  static const p2m_type_t memtype[] = {
>  [HVMMEM_ram_rw]  = p2m_ram_rw,
>  [HVMMEM_ram_ro]  = p2m_ram_ro,
> -[HVMMEM_mmio_dm] = p2m_mmio_dm
> +[HVMMEM_mmio_dm] = p2m_mmio_dm,
> +[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm
>  };
>  
>  if ( copy_from_guest(&a, arg, 1) )
> @@ -5987,14 +5991,17 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  rc = -EAGAIN;
>  goto param_fail4;
>  }
> +

Stray addition of a blank line?

>  if ( !p2m_is_ram(t) &&
> - (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
> + (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
> + t != p2m_mmio_write_dm )

Do you really want to permit e.g. transitions between mmio_dm and
mmio_write_dm? We should be as restrictive as possible here to not
open up paths to security problems.

>  {
>  put_gfn(d, pfn);
>  goto param_fail4;
>  }
>  
>  rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]);
> +
>  put_gfn(d, pfn);
>  if ( rc )
>  goto param_fail4;

Another stray newline addition.

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -72,6 +72,7 @@ typedef enum {
>  p2m_ram_shared = 12,  /* Shared or sharable memory */
>  p2m_ram_broken = 13,  /* Broken page, access cause domain crash 
> */
>  p2m_map_foreign  = 14,/* ram pages from foreign domain */
> +p2m_mmio_write_dm = 15,   /* Read-only; writes go to the device 
> model */
>  } p2m_type_t;
>  
>  /* Modifiers to the query */
> 

If the new type is really needed, shouldn't this get added to
P2M_RO_TYPES?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xenstore.h clarifications

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 12:51 +0200, Razvan Cojocaru wrote:

It's a good idea to CC the relevant maintainers if you want their input.

> Hello,
> 
> I know that xc_interface_open() can be safely called several times from
> the same process, and that several processes can each have a bunch of
> xc_interface handles open, and that I shouldn't use an xc_interface
> inherited from the parent in a child process, because xenctrl.h says so.
> 
> Is it safe to assume that the same restrictions / conventions apply to
> xs_handles obtained via xs_open()? Xenstore.h is not explicit. Looking
> at the code, it would seem safe to assume that it can be used in a
> similar manner, but it would be nice to have this confirmed if possible.

I think there's a pretty good chance that the same applies to xenstore
connections made over the device/shared-ring interface.

I'm not really sure about the semantics of a Unix domain socket after a
fork, but I don't expect both parent and child could sanely make use of
it.

So I think the answer is:

  * Connections made with xs_open(0) (which might be shared page or
socket based) are only guaranteed to work in the parent after
fork.
  * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable
in either the parent or the child after fork, but not both.
  * xs_daemon_open*() and xs_domain_open() are deprecated synonyms
for xs_open(0)
  * XS_OPEN_READONLY has not bearing on any of this.

Ian, does that seem right?

Razvan, assuming Ian concurs with the above (or corrects it) then could
you knock up a patch to document the result please.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xenstore.h clarifications

2014-11-28 Thread Razvan Cojocaru
On 11/28/2014 11:58 AM, Ian Campbell wrote:
> On Thu, 2014-11-27 at 12:51 +0200, Razvan Cojocaru wrote:
> 
> It's a good idea to CC the relevant maintainers if you want their input.
> 
>> Hello,
>>
>> I know that xc_interface_open() can be safely called several times from
>> the same process, and that several processes can each have a bunch of
>> xc_interface handles open, and that I shouldn't use an xc_interface
>> inherited from the parent in a child process, because xenctrl.h says so.
>>
>> Is it safe to assume that the same restrictions / conventions apply to
>> xs_handles obtained via xs_open()? Xenstore.h is not explicit. Looking
>> at the code, it would seem safe to assume that it can be used in a
>> similar manner, but it would be nice to have this confirmed if possible.
> 
> I think there's a pretty good chance that the same applies to xenstore
> connections made over the device/shared-ring interface.
> 
> I'm not really sure about the semantics of a Unix domain socket after a
> fork, but I don't expect both parent and child could sanely make use of
> it.
> 
> So I think the answer is:
> 
>   * Connections made with xs_open(0) (which might be shared page or
> socket based) are only guaranteed to work in the parent after
> fork.
>   * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable
> in either the parent or the child after fork, but not both.
>   * xs_daemon_open*() and xs_domain_open() are deprecated synonyms
> for xs_open(0)
>   * XS_OPEN_READONLY has not bearing on any of this.
> 
> Ian, does that seem right?
> 
> Razvan, assuming Ian concurs with the above (or corrects it) then could
> you knock up a patch to document the result please.

Sure, I'll document whatever gets confirmed.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] missing chunk of HVM direct kernel boot patch

2014-11-28 Thread Stefano Stabellini
On Fri, 28 Nov 2014, Chunyan Liu wrote:
> Found by Stefano, this chunk of the patch was never applied to
> xen-unstable (commit 11dffa2359e8a2629490c14c029c7c7c777b3e47),
> see http://marc.info/?l=qemu-devel&m=140471493425353&w=2.
> 
> Signed-off-by: Chunyan Liu 

I think it should be in 4.5: at the moment the feature is broken and
only half-applied to the tree.

Reviewed-by: Stefano Stabellini 


>  tools/libxl/libxl_dm.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3e191c3..b25b574 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -527,6 +527,15 @@ static char ** 
> libxl__build_device_model_args_new(libxl__gc *gc,
>  if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
>  int ioemu_nics = 0;
>  
> +if (b_info->kernel)
> +flexarray_vappend(dm_args, "-kernel", b_info->kernel, NULL);
> +
> +if (b_info->ramdisk)
> +flexarray_vappend(dm_args, "-initrd", b_info->ramdisk, NULL);
> +
> +if (b_info->cmdline)
> +flexarray_vappend(dm_args, "-append", b_info->cmdline, NULL);
> +
>  if (b_info->u.hvm.serial || b_info->u.hvm.serial_list) {
>  if ( b_info->u.hvm.serial && b_info->u.hvm.serial_list )
>  {
> -- 
> 1.8.4.5
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about network performance difference between dom0 and host

2014-11-28 Thread Zoltan Kiss

Hi,

I would check whether GRO is enabled under Dom0 or not (ethtool -k 
ethX). Comparing top during test (especially that which context use how 
many percentage), and the number of interrupts per second (grep eth 
/proc/interrupts) would be interesting too.


Regards,

Zoltan

On 28/11/14 02:29, zhangleiqiang wrote:


  I have done some network performance testing under XEN, and found the result 
which is somehow strange:

When testing between Hosts (Sender and receiver servers are both booting 
from kernel-default), running only one netperf process will be enough to reach 
the upper limit of NIC. However, when testing between Dom0s (Sender and 
receiver servers are both booting from kernel-xen), we must run more count of 
netperf process to reach the upper limit of NIC, running only one netperf 
process the throughout was low. The results using one netperf process are as 
follows:

  TCP 512TCP 1460 TCP 4096TCP 8500
Hosts:9134.249444.84  9448.05  9447.58(Mbs)
Dom0s:2063.93018.95 6561.29 5008.17(Mbs)

   The question is why one netperf process cannot reach the max throughout of 
NIC under Dom0 ? I know that Dom0 will have extra overhead when handling 
interrupt which must be handled by hypervisor first, but I think it is not the 
reason for it.

   The testing environment details are as follows:

1. Hardware
a. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, 2 CPU 6 Cores with Hyper 
Thread enabled
b. NIC: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network 
Connection (rev 01)
2. Sofware:
a. HostOS: SUSE SLES 11 SP3 (Kernel 3.0.76)
b. NIC Driver: IXGBE 3.21.2
c. OVS: 2.1.3
d. MTU: 1600
e. Dom0:6U6G
3. Networking Environment:
a. All network flows are transmit/receive through OVS
b. Sender server and receiver server are connected directly between 
10GE NIC
4. Testing Tools:
a. Sender: netperf
b. Receiver: netserver


--
zhangleiqiang (Trump)

Best Regards
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Li, Liang Z
>> -if (!hvm_pse1gb_supported(d))
>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
>>  *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
>
>With
>
>#define hvm_pse1gb_supported(d) \
>(cpu_has_page1gb && paging_mode_hap(d))

>the change above is pointless. While, considering this, comments on
>v2 may have been misleading, you should have simply updated the patch 
>description instead to clarify why the v2 change was okay even for the shadow 
>mode case.

I checked the code and found that for a normal guest can only be in hap mode or 
shadow mode before I sending the patch, but I am not share if it's true for 
dom0. 

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Andrew Cooper
On 28/11/14 10:29, Li, Liang Z wrote:
>>> -if (!hvm_pse1gb_supported(d))
>>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
>>>  *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
>> With
>>
>> #define hvm_pse1gb_supported(d) \
>>(cpu_has_page1gb && paging_mode_hap(d))
>> the change above is pointless. While, considering this, comments on
>> v2 may have been misleading, you should have simply updated the patch 
>> description instead to clarify why the v2 change was okay even for the 
>> shadow mode case.
> I checked the code and found that for a normal guest can only be in hap mode 
> or shadow mode before I sending the patch, but I am not share if it's true 
> for dom0. 
>
> Liang

Dom0 may either be PV (in which case neither hap nor shadow, and cant
use 1GB pages anyway), or experimentally PVH which is currently
restricted to hap.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Li, Liang Z
>>>(cpu_has_page1gb && paging_mode_hap(d)) the change above is 
>>> pointless. While, considering this, comments on
>>> v2 may have been misleading, you should have simply updated the patch 
>>> description instead to clarify why the v2 change was okay even for the 
>>> shadow mode case.
>> I checked the code and found that for a normal guest can only be in hap mode 
>> or shadow mode before I sending the patch, but I am not share if it's true 
>> for dom0. 
>>
>> Liang
>
> Dom0 may either be PV (in which case neither hap nor shadow, and cant use 1GB 
> pages anyway), or experimentally PVH which is currently restricted to hap.
>
>~Andrew

Thanks for clarification.  I will resend the v2 patch.

Liang




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 11:29,  wrote:
>> > -if (!hvm_pse1gb_supported(d))
>>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
>>>  *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
>>
>>With
>>
>>#define hvm_pse1gb_supported(d) \
>>(cpu_has_page1gb && paging_mode_hap(d))
> 
>>the change above is pointless. While, considering this, comments on
>>v2 may have been misleading, you should have simply updated the patch 
> description instead to clarify why the v2 change was okay even for the shadow 
> mode case.
> 
> I checked the code and found that for a normal guest can only be in hap mode 
> or shadow mode before I sending the patch, but I am not share if it's true 
> for dom0. 

The CPUID code in libxc doesn't apply to Dom0 at all, and CPUID
handling is also special cased in the hypervisor for Dom0. Plus
finally Dom0 only possibly being PV or PVH, PVH requiring HAP
and PV generally not allowing large pages anyway, your concern
is unnecessary.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 08/10] xen: Hide get_phys_to_machine() to be able to tune common path

2014-11-28 Thread Juergen Gross
Today get_phys_to_machine() is always called when the mfn for a pfn
is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
to be able to avoid calling get_phys_to_machine() when possible as
soon as the switch to a linear mapped p2m list has been done.

Signed-off-by: Juergen Gross 
Reviewed-by: David Vrabel 
---
 arch/x86/include/asm/xen/page.h | 26 --
 arch/x86/xen/mmu.c  |  2 +-
 arch/x86/xen/p2m.c  |  6 +++---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 28fa795..410a6ec 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -59,6 +59,21 @@ extern int clear_foreign_p2m_mapping(struct 
gnttab_unmap_grant_ref *unmap_ops,
 struct page **pages, unsigned int count);
 extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long 
pfn);
 
+/*
+ * When to use pfn_to_mfn(), __pfn_to_mfn() or get_phys_to_machine():
+ * - pfn_to_mfn() returns either INVALID_P2M_ENTRY or the mfn. No indicator
+ *   bits (identity or foreign) are set.
+ * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set
+ *   identity or foreign indicator will be still set. __pfn_to_mfn() is
+ *   encapsulating get_phys_to_machine().
+ * - get_phys_to_machine() is to be called by __pfn_to_mfn() only to allow
+ *   for future optimizations.
+ */
+static inline unsigned long __pfn_to_mfn(unsigned long pfn)
+{
+   return get_phys_to_machine(pfn);
+}
+
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
 {
unsigned long mfn;
@@ -66,7 +81,7 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
if (xen_feature(XENFEAT_auto_translated_physmap))
return pfn;
 
-   mfn = get_phys_to_machine(pfn);
+   mfn = __pfn_to_mfn(pfn);
 
if (mfn != INVALID_P2M_ENTRY)
mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT);
@@ -79,7 +94,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long 
pfn)
if (xen_feature(XENFEAT_auto_translated_physmap))
return 1;
 
-   return get_phys_to_machine(pfn) != INVALID_P2M_ENTRY;
+   return __pfn_to_mfn(pfn) != INVALID_P2M_ENTRY;
 }
 
 static inline unsigned long mfn_to_pfn_no_overrides(unsigned long mfn)
@@ -113,7 +128,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
return mfn;
 
pfn = mfn_to_pfn_no_overrides(mfn);
-   if (get_phys_to_machine(pfn) != mfn) {
+   if (__pfn_to_mfn(pfn) != mfn) {
/*
 * If this appears to be a foreign mfn (because the pfn
 * doesn't map back to the mfn), then check the local override
@@ -129,8 +144,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 * entry doesn't map back to the mfn and m2p_override doesn't have a
 * valid entry for it.
 */
-   if (pfn == ~0 &&
-   get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
+   if (pfn == ~0 && __pfn_to_mfn(mfn) == IDENTITY_FRAME(mfn))
pfn = mfn;
 
return pfn;
@@ -176,7 +190,7 @@ static inline unsigned long mfn_to_local_pfn(unsigned long 
mfn)
return mfn;
 
pfn = mfn_to_pfn(mfn);
-   if (get_phys_to_machine(pfn) != mfn)
+   if (__pfn_to_mfn(pfn) != mfn)
return -1; /* force !pfn_valid() */
return pfn;
 }
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 601914d..3e3f8f8 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -387,7 +387,7 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
unsigned long mfn;
 
if (!xen_feature(XENFEAT_auto_translated_physmap))
-   mfn = get_phys_to_machine(pfn);
+   mfn = __pfn_to_mfn(pfn);
else
mfn = pfn;
/*
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index eddec40..8c3d8fb 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -787,7 +787,7 @@ static int m2p_add_override(unsigned long mfn, struct page 
*page,
 * because mfn_to_pfn (that ends up being called by GUPF) will
 * return the backend pfn rather than the frontend pfn. */
pfn = mfn_to_pfn_no_overrides(mfn);
-   if (get_phys_to_machine(pfn) == mfn)
+   if (__pfn_to_mfn(pfn) == mfn)
set_phys_to_machine(pfn, FOREIGN_FRAME(mfn));
 
return 0;
@@ -967,7 +967,7 @@ static int m2p_remove_override(struct page *page,
 * pfn again. */
mfn &= ~FOREIGN_FRAME_BIT;
pfn = mfn_to_pfn_no_overrides(mfn);
-   if (get_phys_to_machine(pfn) == FOREIGN_FRAME(mfn) &&
+   if (__pfn_to_mfn(pfn) == FOREIGN_FRAME(mfn) &&
m2p_find_override(mfn) == NULL)
set_phys_to_machine(pfn, mfn);
 
@@ -992,7 +992,7 @@ int clear_foreign_p2m

[Xen-devel] [PATCH V4 00/10] xen: Switch to virtual mapped linear p2m list

2014-11-28 Thread Juergen Gross
Paravirtualized kernels running on Xen use a three level tree for
translation of guest specific physical addresses to machine global
addresses. This p2m tree is used for construction of page table
entries, so the p2m tree walk is performance critical.

By using a linear virtual mapped p2m list accesses to p2m elements
can be sped up while even simplifying code. To achieve this goal
some p2m related initializations have to be performed later in the
boot process, as the final p2m list can be set up only after basic
memory management functions are available.

Changes in V4:
- carved out style corrections into new patch 1 as requested by David Vrabel
- minor modification in patch 5 (former patch 3) as suggested by David Vrabel
- carved out change of page allocation scheme from p2m.c into own patch as
  requested by Konrad Wilk, corrected for 32 bit as requested by Andrew Cooper
- added several comments
- simplified error handling in patch 4 (former patch 2)

Changes in V3:
- Carved out (new) patch 1 to make pure code movement more obvious
  as requested by David Vrabel
- New patch 6 introducing __pfn_to_mfn() (taken from patch 7) as
  requested by David Vrabel
- New patch 8 to speed up set_phys_to_machine() as suggested by
  David Vrabel

Changes in V2:
- splitted patch 2 in 4 smaller ones as requested by David Vrabel
- added highmem check when remapping kernel memory as requested by
  David Vrabel


Juergen Gross (10):
  xen: fix some style issues in p2m.c
  xen: Make functions static
  xen: use common page allocation function in p2m.c
  xen: Delay remapping memory of pv-domain
  xen: Delay m2p_override initialization
  xen: Delay invalidating extra memory
  x86: Introduce function to get pmd entry pointer
  xen: Hide get_phys_to_machine() to be able to tune common path
  xen: switch to linear virtual mapped sparse p2m list
  xen: Speed up set_phys_to_machine() by using read-only mappings

 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/include/asm/xen/page.h  |   48 +-
 arch/x86/mm/pageattr.c   |   20 +
 arch/x86/xen/mmu.c   |   38 +-
 arch/x86/xen/p2m.c   | 1320 ++
 arch/x86/xen/setup.c |  443 ++--
 arch/x86/xen/xen-ops.h   |6 +-
 7 files changed, 838 insertions(+), 1038 deletions(-)

-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 03/10] xen: use common page allocation function in p2m.c

2014-11-28 Thread Juergen Gross
In arch/x86/xen/p2m.c three different allocation functions for
obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
or __get_free_page().  Which of those functions is used depends on the
progress of the boot process of the system.

Introduce a common allocation routine selecting the to be called
allocation routine dynamically based on the boot progress. This allows
moving initialization steps without having to care about changing
allocation calls.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/mmu.c |  2 ++
 arch/x86/xen/p2m.c | 57 +-
 2 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a8a1a3d..b995b871 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1219,6 +1219,8 @@ static void __init xen_pagetable_init(void)
paging_init();
 #ifdef CONFIG_X86_64
xen_pagetable_p2m_copy();
+#else
+   xen_revector_p2m_tree();
 #endif
/* Allocate and initialize top and mid mfn levels for p2m structure */
xen_build_mfn_list_list();
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2d8b908..fa53dc2 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -164,6 +164,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -204,6 +205,8 @@ RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / 
(P2M_PER_PAGE * P2M_MID_PER
  */
 RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES);
 
+static int use_brk = 1;
+
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
BUG_ON(pfn >= MAX_P2M_PFN);
@@ -268,6 +271,24 @@ static void p2m_init(unsigned long *p2m)
p2m[i] = INVALID_P2M_ENTRY;
 }
 
+static void * __ref alloc_p2m_page(void)
+{
+   if (unlikely(use_brk))
+   return extend_brk(PAGE_SIZE, PAGE_SIZE);
+
+   if (unlikely(!slab_is_available()))
+   return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+
+   return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT);
+}
+
+/* Only to be called in case of a race for a page just allocated! */
+static void free_p2m_page(void *p)
+{
+   BUG_ON(!slab_is_available());
+   free_page((unsigned long)p);
+}
+
 /*
  * Build the parallel p2m_top_mfn and p2m_mid_mfn structures
  *
@@ -287,13 +308,13 @@ void __ref xen_build_mfn_list_list(void)
 
/* Pre-initialize p2m_top_mfn to be completely missing */
if (p2m_top_mfn == NULL) {
-   p2m_mid_missing_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+   p2m_mid_missing_mfn = alloc_p2m_page();
p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
 
-   p2m_top_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+   p2m_top_mfn_p = alloc_p2m_page();
p2m_top_mfn_p_init(p2m_top_mfn_p);
 
-   p2m_top_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+   p2m_top_mfn = alloc_p2m_page();
p2m_top_mfn_init(p2m_top_mfn);
} else {
/* Reinitialise, mfn's all change after migration */
@@ -327,7 +348,7 @@ void __ref xen_build_mfn_list_list(void)
 * missing parts of the mfn tree after
 * runtime.
 */
-   mid_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+   mid_mfn_p = alloc_p2m_page();
p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
 
p2m_top_mfn_p[topidx] = mid_mfn_p;
@@ -364,17 +385,17 @@ void __init xen_build_dynamic_phys_to_machine(void)
max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
xen_max_p2m_pfn = max_pfn;
 
-   p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   p2m_missing = alloc_p2m_page();
p2m_init(p2m_missing);
-   p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   p2m_identity = alloc_p2m_page();
p2m_init(p2m_identity);
 
-   p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   p2m_mid_missing = alloc_p2m_page();
p2m_mid_init(p2m_mid_missing, p2m_missing);
-   p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   p2m_mid_identity = alloc_p2m_page();
p2m_mid_init(p2m_mid_identity, p2m_identity);
 
-   p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   p2m_top = alloc_p2m_page();
p2m_top_init(p2m_top);
 
/*
@@ -387,7 +408,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
unsigned mididx = p2m_mid_index(pfn);
 
if (p2m_top[topidx] == p2m_mid_missing) {
-   unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   unsigned long **mid = alloc_p2m_page();
p2m_mid_init(mid, p2m_missing);
 
p2m_top[topidx] = mid;
@@ -420,6 +441,7 @@ unsigned long __init xen_revector_p2m_tree(void)
unsigned long *mfn_list = N

[Xen-devel] [PATCH V4 05/10] xen: Delay m2p_override initialization

2014-11-28 Thread Juergen Gross
The m2p overrides are used to be able to find the local pfn for a
foreign mfn mapped into the domain. They are used by driver backends
having to access frontend data.

As this functionality isn't used in early boot it makes no sense to
initialize the m2p override functions very early. It can be done
later without doing any harm, removing the need for allocating memory
via extend_brk().

While at it make some m2p override functions static as they are only
used internally.

Signed-off-by: Juergen Gross 
Reviewed-by: David Vrabel 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/p2m.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 24cd9d1..8676f35 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -428,8 +428,6 @@ void __init xen_build_dynamic_phys_to_machine(void)
}
p2m_top[topidx][mididx] = &mfn_list[pfn];
}
-
-   m2p_override_init();
 }
 #ifdef CONFIG_X86_64
 unsigned long __init xen_revector_p2m_tree(void)
@@ -500,13 +498,15 @@ unsigned long __init xen_revector_p2m_tree(void)
}
/* This should be the leafs allocated for identity from _brk. */
}
-   return (unsigned long)mfn_list;
 
+   m2p_override_init();
+   return (unsigned long)mfn_list;
 }
 #else
 unsigned long __init xen_revector_p2m_tree(void)
 {
use_brk = 0;
+   m2p_override_init();
return 0;
 }
 #endif
@@ -796,15 +796,16 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
 #define M2P_OVERRIDE_HASH_SHIFT10
 #define M2P_OVERRIDE_HASH  (1 << M2P_OVERRIDE_HASH_SHIFT)
 
-static RESERVE_BRK_ARRAY(struct list_head, m2p_overrides, M2P_OVERRIDE_HASH);
+static struct list_head *m2p_overrides;
 static DEFINE_SPINLOCK(m2p_override_lock);
 
 static void __init m2p_override_init(void)
 {
unsigned i;
 
-   m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
-  sizeof(unsigned long));
+   m2p_overrides = alloc_bootmem_align(
+   sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
+   sizeof(unsigned long));
 
for (i = 0; i < M2P_OVERRIDE_HASH; i++)
INIT_LIST_HEAD(&m2p_overrides[i]);
@@ -932,10 +933,14 @@ EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
 static struct page *m2p_find_override(unsigned long mfn)
 {
unsigned long flags;
-   struct list_head *bucket = &m2p_overrides[mfn_hash(mfn)];
+   struct list_head *bucket;
struct page *p, *ret;
 
+   if (unlikely(!m2p_overrides))
+   return NULL;
+
ret = NULL;
+   bucket = &m2p_overrides[mfn_hash(mfn)];
 
spin_lock_irqsave(&m2p_override_lock, flags);
 
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 2/9] xen: Make functions static

2014-11-28 Thread Juergen Gross
Some functions in arch/x86/xen/p2m.c are used locally only. Make them
static. Rearrange the functions in p2m.c to avoid forward declarations.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/xen/page.h |   6 -
 arch/x86/xen/p2m.c  | 346 
 2 files changed, 172 insertions(+), 180 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index c949923..6c16451 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -52,15 +52,9 @@ extern unsigned long set_phys_range_identity(unsigned long 
pfn_s,
 extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
   struct gnttab_map_grant_ref *kmap_ops,
   struct page **pages, unsigned int count);
-extern int m2p_add_override(unsigned long mfn, struct page *page,
-   struct gnttab_map_grant_ref *kmap_op);
 extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 struct gnttab_map_grant_ref *kmap_ops,
 struct page **pages, unsigned int count);
-extern int m2p_remove_override(struct page *page,
-  struct gnttab_map_grant_ref *kmap_op,
-  unsigned long mfn);
-extern struct page *m2p_find_override(unsigned long mfn);
 extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long 
pfn);
 
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 73d354a..2d8b908 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -896,6 +896,61 @@ static unsigned long mfn_hash(unsigned long mfn)
return hash_long(mfn, M2P_OVERRIDE_HASH_SHIFT);
 }
 
+/* Add an MFN override for a particular page */
+static int m2p_add_override(unsigned long mfn, struct page *page,
+   struct gnttab_map_grant_ref *kmap_op)
+{
+   unsigned long flags;
+   unsigned long pfn;
+   unsigned long uninitialized_var(address);
+   unsigned level;
+   pte_t *ptep = NULL;
+
+   pfn = page_to_pfn(page);
+   if (!PageHighMem(page)) {
+   address = (unsigned long)__va(pfn << PAGE_SHIFT);
+   ptep = lookup_address(address, &level);
+   if (WARN(ptep == NULL || level != PG_LEVEL_4K,
+"m2p_add_override: pfn %lx not mapped", pfn))
+   return -EINVAL;
+   }
+
+   if (kmap_op != NULL) {
+   if (!PageHighMem(page)) {
+   struct multicall_space mcs =
+   xen_mc_entry(sizeof(*kmap_op));
+
+   MULTI_grant_table_op(mcs.mc,
+   GNTTABOP_map_grant_ref, kmap_op, 1);
+
+   xen_mc_issue(PARAVIRT_LAZY_MMU);
+   }
+   }
+   spin_lock_irqsave(&m2p_override_lock, flags);
+   list_add(&page->lru,  &m2p_overrides[mfn_hash(mfn)]);
+   spin_unlock_irqrestore(&m2p_override_lock, flags);
+
+   /* p2m(m2p(mfn)) == mfn: the mfn is already present somewhere in
+* this domain. Set the FOREIGN_FRAME_BIT in the p2m for the other
+* pfn so that the following mfn_to_pfn(mfn) calls will return the
+* pfn from the m2p_override (the backend pfn) instead.
+* We need to do this because the pages shared by the frontend
+* (xen-blkfront) can be already locked (lock_page, called by
+* do_read_cache_page); when the userspace backend tries to use them
+* with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
+* do_blockdev_direct_IO is going to try to lock the same pages
+* again resulting in a deadlock.
+* As a side effect get_user_pages_fast might not be safe on the
+* frontend pages while they are being shared with the backend,
+* because mfn_to_pfn (that ends up being called by GUPF) will
+* return the backend pfn rather than the frontend pfn. */
+   pfn = mfn_to_pfn_no_overrides(mfn);
+   if (get_phys_to_machine(pfn) == mfn)
+   set_phys_to_machine(pfn, FOREIGN_FRAME(mfn));
+
+   return 0;
+}
+
 int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
@@ -955,61 +1010,123 @@ out:
 }
 EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
 
-/* Add an MFN override for a particular page */
-int m2p_add_override(unsigned long mfn, struct page *page,
-   struct gnttab_map_grant_ref *kmap_op)
-{
-   unsigned long flags;
-   unsigned long pfn;
-   unsigned long uninitialized_var(address);
-   unsigned level;
-   pte_t *ptep = NULL;
-
-   pfn = page_to_pfn(page);
-   if (!PageHighMem(page)) {
-

[Xen-devel] [PATCH V4 1/9] xen: fix some style issues in p2m.c

2014-11-28 Thread Juergen Gross
The source arch/x86/xen/p2m.c has some coding style issues. Fix them.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/p2m.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 9201a38..73d354a 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -922,7 +922,7 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
*map_ops,
continue;
 
if (map_ops[i].flags & GNTMAP_contains_pte) {
-   pte = (pte_t *) 
(mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) +
+   pte = (pte_t 
*)(mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) +
(map_ops[i].host_addr & ~PAGE_MASK));
mfn = pte_mfn(*pte);
} else {
@@ -970,7 +970,7 @@ int m2p_add_override(unsigned long mfn, struct page *page,
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
if (WARN(ptep == NULL || level != PG_LEVEL_4K,
-   "m2p_add_override: pfn %lx not mapped", 
pfn))
+"m2p_add_override: pfn %lx not mapped", pfn))
return -EINVAL;
}
 
@@ -1072,7 +1072,7 @@ int m2p_remove_override(struct page *page,
ptep = lookup_address(address, &level);
 
if (WARN(ptep == NULL || level != PG_LEVEL_4K,
-   "m2p_remove_override: pfn %lx not 
mapped", pfn))
+"m2p_remove_override: pfn %lx not mapped", pfn))
return -EINVAL;
}
 
@@ -1102,9 +1102,8 @@ int m2p_remove_override(struct page *page,
 * hypercall actually returned an error.
 */
if (kmap_op->handle == GNTST_general_error) {
-   printk(KERN_WARNING "m2p_remove_override: "
-   "pfn %lx mfn %lx, failed to 
modify kernel mappings",
-   pfn, mfn);
+   pr_warn("m2p_remove_override: pfn %lx mfn %lx, 
failed to modify kernel mappings",
+   pfn, mfn);
put_balloon_scratch_page();
return -1;
}
@@ -1112,14 +,14 @@ int m2p_remove_override(struct page *page,
xen_mc_batch();
 
mcs = __xen_mc_entry(
-   sizeof(struct 
gnttab_unmap_and_replace));
+   sizeof(struct gnttab_unmap_and_replace));
unmap_op = mcs.args;
unmap_op->host_addr = kmap_op->host_addr;
unmap_op->new_addr = scratch_page_address;
unmap_op->handle = kmap_op->handle;
 
MULTI_grant_table_op(mcs.mc,
-   GNTTABOP_unmap_and_replace, unmap_op, 
1);
+   GNTTABOP_unmap_and_replace, unmap_op, 1);
 
mcs = __xen_mc_entry(0);
MULTI_update_va_mapping(mcs.mc, scratch_page_address,
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 09/10] xen: switch to linear virtual mapped sparse p2m list

2014-11-28 Thread Juergen Gross
At start of the day the Xen hypervisor presents a contiguous mfn list
to a pv-domain. In order to support sparse memory this mfn list is
accessed via a three level p2m tree built early in the boot process.
Whenever the system needs the mfn associated with a pfn this tree is
used to find the mfn.

Instead of using a software walked tree for accessing a specific mfn
list entry this patch is creating a virtual address area for the
entire possible mfn list including memory holes. The holes are
covered by mapping a pre-defined  page consisting only of "invalid
mfn" entries. Access to a mfn entry is possible by just using the
virtual base address of the mfn list and the pfn as index into that
list. This speeds up the (hot) path of determining the mfn of a
pfn.

Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
showed following improvements:

Elapsed time: 32:50 ->  32:35
System:   18:07 ->  17:47
User:104:00 -> 103:30

Tested with following configurations:
- 64 bit dom0, 8GB RAM
- 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
- 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
the patch)
- 32 bit domU, ballooning up and down
- 32 bit domU, save and restore
- 32 bit domU with PCI passthrough
- 64 bit domU, 8 GB, 2049 MB, 5000 MB
- 64 bit domU, ballooning up and down
- 64 bit domU, save and restore
- 64 bit domU with PCI passthrough

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/xen/page.h |  20 +-
 arch/x86/xen/mmu.c  |  34 +-
 arch/x86/xen/p2m.c  | 735 +---
 arch/x86/xen/xen-ops.h  |   2 +-
 4 files changed, 347 insertions(+), 444 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 410a6ec..f5d5de4 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -65,13 +65,25 @@ extern unsigned long m2p_find_override_pfn(unsigned long 
mfn, unsigned long pfn)
  *   bits (identity or foreign) are set.
  * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set
  *   identity or foreign indicator will be still set. __pfn_to_mfn() is
- *   encapsulating get_phys_to_machine().
- * - get_phys_to_machine() is to be called by __pfn_to_mfn() only to allow
- *   for future optimizations.
+ *   encapsulating get_phys_to_machine() which is called in special cases only.
+ * - get_phys_to_machine() is to be called by __pfn_to_mfn() only in special
+ *   cases needing an extended handling.
  */
 static inline unsigned long __pfn_to_mfn(unsigned long pfn)
 {
-   return get_phys_to_machine(pfn);
+   unsigned long mfn;
+
+   if (pfn < xen_p2m_size)
+   mfn = xen_p2m_addr[pfn];
+   else if (unlikely(pfn < xen_max_p2m_pfn))
+   return get_phys_to_machine(pfn);
+   else
+   return IDENTITY_FRAME(pfn);
+
+   if (unlikely(mfn == INVALID_P2M_ENTRY))
+   return get_phys_to_machine(pfn);
+
+   return mfn;
 }
 
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e3f8f8..6ab6150 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
 * instead of somewhere later and be confusing. */
xen_mc_flush();
 }
-static void __init xen_pagetable_p2m_copy(void)
+
+static void __init xen_pagetable_p2m_free(void)
 {
unsigned long size;
unsigned long addr;
-   unsigned long new_mfn_list;
-
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return;
 
size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
 
-   new_mfn_list = xen_revector_p2m_tree();
/* No memory or already called. */
-   if (!new_mfn_list || new_mfn_list == xen_start_info->mfn_list)
+   if ((unsigned long)xen_p2m_addr == xen_start_info->mfn_list)
return;
 
/* using __ka address and sticking INVALID_P2M_ENTRY! */
@@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void)
 
size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
memblock_free(__pa(xen_start_info->mfn_list), size);
-   /* And revector! Bye bye old array */
-   xen_start_info->mfn_list = new_mfn_list;
 
/* At this stage, cleanup_highmap has already cleaned __ka space
 * from _brk_limit way up to the max_pfn_mapped (which is the end of
@@ -1214,14 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void)
 }
 #endif
 
-static void __init xen_pagetable_init(void)
+static void __init xen_pagetable_p2m_setup(void)
 {
-   paging_init();
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
+   xen_vmalloc_p2m_tree();
+
 #ifdef CONFIG_X86_64
-   xen_pagetable_p2m_copy();
-#else
-   xen_revector_p2m_tree();
+   xen_pagetable_p2m_free()

[Xen-devel] [PATCH V4 04/10] xen: Delay remapping memory of pv-domain

2014-11-28 Thread Juergen Gross
Early in the boot process the memory layout of a pv-domain is changed
to match the E820 map (either the host one for Dom0 or the Xen one)
regarding placement of RAM and PCI holes. This requires removing memory
pages initially located at positions not suitable for RAM and adding
them later at higher addresses where no restrictions apply.

To be able to operate on the hypervisor supported p2m list until a
virtual mapped linear p2m list can be constructed, remapping must
be delayed until virtual memory management is initialized, as the
initial p2m list can't be extended unlimited at physical memory
initialization time due to it's fixed structure.

A further advantage is the reduction in complexity and code volume as
we don't have to be careful regarding memory restrictions during p2m
updates.

Signed-off-by: Juergen Gross 
Reviewed-by: David Vrabel 
---
 arch/x86/include/asm/xen/page.h |   1 -
 arch/x86/xen/mmu.c  |   4 +
 arch/x86/xen/p2m.c  |  94 --
 arch/x86/xen/setup.c| 369 ++--
 arch/x86/xen/xen-ops.h  |   1 +
 5 files changed, 172 insertions(+), 297 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 6c16451..b475297 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -44,7 +44,6 @@ extern unsigned long  machine_to_phys_nr;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
-extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long 
mfn);
 extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern unsigned long set_phys_range_identity(unsigned long pfn_s,
 unsigned long pfn_e);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b995b871..601914d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1225,6 +1225,10 @@ static void __init xen_pagetable_init(void)
/* Allocate and initialize top and mid mfn levels for p2m structure */
xen_build_mfn_list_list();
 
+   /* Remap memory freed due to conflicts with E820 map */
+   if (!xen_feature(XENFEAT_auto_translated_physmap))
+   xen_remap_memory();
+
xen_setup_shared_info();
xen_post_allocator_init();
 }
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index fa53dc2..24cd9d1 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -662,100 +662,6 @@ static bool __init early_alloc_p2m_middle(unsigned long 
pfn)
return true;
 }
 
-/*
- * Skim over the P2M tree looking at pages that are either filled with
- * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and
- * replace the P2M leaf with a p2m_missing or p2m_identity.
- * Stick the old page in the new P2M tree location.
- */
-static bool __init early_can_reuse_p2m_middle(unsigned long set_pfn)
-{
-   unsigned topidx;
-   unsigned mididx;
-   unsigned ident_pfns;
-   unsigned inv_pfns;
-   unsigned long *p2m;
-   unsigned idx;
-   unsigned long pfn;
-
-   /* We only look when this entails a P2M middle layer */
-   if (p2m_index(set_pfn))
-   return false;
-
-   for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
-   topidx = p2m_top_index(pfn);
-
-   if (!p2m_top[topidx])
-   continue;
-
-   if (p2m_top[topidx] == p2m_mid_missing)
-   continue;
-
-   mididx = p2m_mid_index(pfn);
-   p2m = p2m_top[topidx][mididx];
-   if (!p2m)
-   continue;
-
-   if ((p2m == p2m_missing) || (p2m == p2m_identity))
-   continue;
-
-   if ((unsigned long)p2m == INVALID_P2M_ENTRY)
-   continue;
-
-   ident_pfns = 0;
-   inv_pfns = 0;
-   for (idx = 0; idx < P2M_PER_PAGE; idx++) {
-   /* IDENTITY_PFNs are 1:1 */
-   if (p2m[idx] == IDENTITY_FRAME(pfn + idx))
-   ident_pfns++;
-   else if (p2m[idx] == INVALID_P2M_ENTRY)
-   inv_pfns++;
-   else
-   break;
-   }
-   if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE))
-   goto found;
-   }
-   return false;
-found:
-   /* Found one, replace old with p2m_identity or p2m_missing */
-   p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing);
-
-   /* Reset where we want to stick the old page in. */
-   topidx = p2m_top_index(set_pfn);
-   mididx = p2m_mid_index(set_pfn);
-
-   /* This shouldn't happen */
-   if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
-   early_alloc_p2m_middle(set_pfn);
-

[Xen-devel] [PATCH V4 07/10] x86: Introduce function to get pmd entry pointer

2014-11-28 Thread Juergen Gross
Introduces lookup_pmd_address() to get the address of the pmd entry
related to a virtual address in the current address space. This
function is needed for support of a virtual mapped sparse p2m list
in xen pv domains, as we need the address of the pmd entry, not the
one of the pte in that case.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/pgtable_types.h |  1 +
 arch/x86/mm/pageattr.c   | 20 
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 0778964..d83f5e7 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -396,6 +396,7 @@ static inline void update_page_count(int level, unsigned 
long pages) { }
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
unsigned int *level);
+extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
   unsigned numpages, unsigned long page_flags);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 36de293..1298108 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -384,6 +384,26 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, 
unsigned long address,
 }
 
 /*
+ * Lookup the PMD entry for a virtual address. Return a pointer to the entry
+ * or NULL if not present.
+ */
+pmd_t *lookup_pmd_address(unsigned long address)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+
+   pgd = pgd_offset_k(address);
+   if (pgd_none(*pgd))
+   return NULL;
+
+   pud = pud_offset(pgd, address);
+   if (pud_none(*pud) || pud_large(*pud) || !pud_present(*pud))
+   return NULL;
+
+   return pmd_offset(pud, address);
+}
+
+/*
  * This is necessary because __pa() does not work on some
  * kinds of memory, like vmalloc() or the alloc_remap()
  * areas on 32-bit NUMA systems.  The percpu areas can
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 06/10] xen: Delay invalidating extra memory

2014-11-28 Thread Juergen Gross
When the physical memory configuration is initialized the p2m entries
for not pouplated memory pages are set to "invalid". As those pages
are beyond the hypervisor built p2m list the p2m tree has to be
extended.

This patch delays processing the extra memory related p2m entries
during the boot process until some more basic memory management
functions are callable. This removes the need to create new p2m
entries until virtual memory management is available.

Signed-off-by: Juergen Gross 
Reviewed-by: David Vrabel 
---
 arch/x86/include/asm/xen/page.h |   3 +
 arch/x86/xen/p2m.c  | 128 
 arch/x86/xen/setup.c|  98 ++
 arch/x86/xen/xen-ops.h  |   3 +-
 4 files changed, 103 insertions(+), 129 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b475297..28fa795 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -41,6 +41,9 @@ typedef struct xpaddr {
 
 extern unsigned long *machine_to_phys_mapping;
 extern unsigned long  machine_to_phys_nr;
+extern unsigned long *xen_p2m_addr;
+extern unsigned long  xen_p2m_size;
+extern unsigned long  xen_max_p2m_pfn;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 8676f35..eddec40 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -181,7 +181,12 @@
 
 static void __init m2p_override_init(void);
 
+unsigned long *xen_p2m_addr __read_mostly;
+EXPORT_SYMBOL_GPL(xen_p2m_addr);
+unsigned long xen_p2m_size __read_mostly;
+EXPORT_SYMBOL_GPL(xen_p2m_size);
 unsigned long xen_max_p2m_pfn __read_mostly;
+EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);
 
 static unsigned long *p2m_mid_missing_mfn;
 static unsigned long *p2m_top_mfn;
@@ -198,13 +203,6 @@ static RESERVE_BRK_ARRAY(unsigned long *, 
p2m_mid_identity, P2M_MID_PER_PAGE);
 
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * 
P2M_MID_PER_PAGE)));
 
-/* For each I/O range remapped we may lose up to two leaf pages for the 
boundary
- * violations and three mid pages to cover up to 3GB. With
- * early_can_reuse_p2m_middle() most of the leaf pages will be reused by the
- * remapped region.
- */
-RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES);
-
 static int use_brk = 1;
 
 static inline unsigned p2m_top_index(unsigned long pfn)
@@ -381,9 +379,11 @@ void __init xen_build_dynamic_phys_to_machine(void)
 if (xen_feature(XENFEAT_auto_translated_physmap))
return;
 
+   xen_p2m_addr = (unsigned long *)xen_start_info->mfn_list;
mfn_list = (unsigned long *)xen_start_info->mfn_list;
max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
xen_max_p2m_pfn = max_pfn;
+   xen_p2m_size = max_pfn;
 
p2m_missing = alloc_p2m_page();
p2m_init(p2m_missing);
@@ -499,6 +499,11 @@ unsigned long __init xen_revector_p2m_tree(void)
/* This should be the leafs allocated for identity from _brk. */
}
 
+   xen_p2m_size = xen_max_p2m_pfn;
+   xen_p2m_addr = mfn_list;
+
+   xen_inv_extra_mem();
+
m2p_override_init();
return (unsigned long)mfn_list;
 }
@@ -506,6 +511,8 @@ unsigned long __init xen_revector_p2m_tree(void)
 unsigned long __init xen_revector_p2m_tree(void)
 {
use_brk = 0;
+   xen_p2m_size = xen_max_p2m_pfn;
+   xen_inv_extra_mem();
m2p_override_init();
return 0;
 }
@@ -514,8 +521,12 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 {
unsigned topidx, mididx, idx;
 
-   if (unlikely(pfn >= MAX_P2M_PFN))
+   if (unlikely(pfn >= xen_p2m_size)) {
+   if (pfn < xen_max_p2m_pfn)
+   return xen_chk_extra_mem(pfn);
+
return IDENTITY_FRAME(pfn);
+   }
 
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
@@ -613,78 +624,12 @@ static bool alloc_p2m(unsigned long pfn)
return true;
 }
 
-static bool __init early_alloc_p2m(unsigned long pfn, bool check_boundary)
-{
-   unsigned topidx, mididx, idx;
-   unsigned long *p2m;
-
-   topidx = p2m_top_index(pfn);
-   mididx = p2m_mid_index(pfn);
-   idx = p2m_index(pfn);
-
-   /* Pfff.. No boundary cross-over, lets get out. */
-   if (!idx && check_boundary)
-   return false;
-
-   WARN(p2m_top[topidx][mididx] == p2m_identity,
-   "P2M[%d][%d] == IDENTITY, should be MISSING (or alloced)!\n",
-   topidx, mididx);
-
-   /*
-* Could be done by xen_build_dynamic_phys_to_machine..
-*/
-   if (p2m_top[topidx][mididx] != p2m_missing)
-   return false;
-
-   /* Boundary cross-over for the edges: */
-   p2m = alloc_p2m_page();
-
-   p2m_init(p2m);
-
-   p2m_top[topidx][mididx] = p2m;
-
-  

[Xen-devel] [PATCH V4 10/10] xen: Speed up set_phys_to_machine() by using read-only mappings

2014-11-28 Thread Juergen Gross
Instead of checking at each call of set_phys_to_machine() whether a
new p2m page has to be allocated due to writing an entry in a large
invalid or identity area, just map those areas read only and react
to a page fault on write by allocating the new page.

This change will make the common path with no allocation much
faster as it only requires a single write of the new mfn instead
of walking the address translation tables and checking for the
special cases.

Suggested-by: David Vrabel 
Signed-off-by: Juergen Gross 
Reviewed-by: David Vrabel 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/p2m.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 7d84473..8b5db51 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -70,6 +70,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -316,9 +317,9 @@ static void __init xen_rebuild_p2m_list(unsigned long *p2m)
paravirt_alloc_pte(&init_mm, __pa(p2m_identity_pte) >> PAGE_SHIFT);
for (i = 0; i < PTRS_PER_PTE; i++) {
set_pte(p2m_missing_pte + i,
-   pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL));
+   pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL_RO));
set_pte(p2m_identity_pte + i,
-   pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL));
+   pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL_RO));
}
 
for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += chunk) {
@@ -365,7 +366,7 @@ static void __init xen_rebuild_p2m_list(unsigned long *p2m)
p2m_missing : p2m_identity;
ptep = populate_extra_pte((unsigned long)(p2m + pfn));
set_pte(ptep,
-   pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL));
+   pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL_RO));
continue;
}
 
@@ -624,6 +625,9 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
return true;
}
 
+   if (likely(!__put_user(mfn, xen_p2m_addr + pfn)))
+   return true;
+
ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level);
BUG_ON(!ptep || level != PG_LEVEL_4K);
 
@@ -633,9 +637,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity)))
return mfn == IDENTITY_FRAME(pfn);
 
-   xen_p2m_addr[pfn] = mfn;
-
-   return true;
+   return false;
 }
 
 bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Liang Li
If hardware support the 1GB pages, expose the feature to guest by
default. Users don't have to use a 'cpuid= ' option in config fil
e to turn it on.

If guest use shadow mode, the 1GB pages feature will be hidden from
guest, this is done in the function hvm_cpuid(). So the change is
okay for shadow mode case.

Signed-off-by: Liang Li 
Signed-off-by: Yang Zhang 
---
 tools/libxc/xc_cpuid_x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index a18b1ff..c97f91a 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -109,6 +109,7 @@ static void amd_xc_cpuid_policy(
 regs[3] &= (0x0183f3ff | /* features shared with 0x0001:EDX */
 bitmaskof(X86_FEATURE_NX) |
 bitmaskof(X86_FEATURE_LM) |
+bitmaskof(X86_FEATURE_PAGE1GB) |
 bitmaskof(X86_FEATURE_SYSCALL) |
 bitmaskof(X86_FEATURE_MP) |
 bitmaskof(X86_FEATURE_MMXEXT) |
@@ -192,6 +193,7 @@ static void intel_xc_cpuid_policy(
 bitmaskof(X86_FEATURE_ABM));
 regs[3] &= (bitmaskof(X86_FEATURE_NX) |
 bitmaskof(X86_FEATURE_LM) |
+bitmaskof(X86_FEATURE_PAGE1GB) |
 bitmaskof(X86_FEATURE_SYSCALL) |
 bitmaskof(X86_FEATURE_RDTSCP));
 break;
@@ -386,6 +388,7 @@ static void xc_cpuid_hvm_policy(
 clear_bit(X86_FEATURE_LM, regs[3]);
 clear_bit(X86_FEATURE_NX, regs[3]);
 clear_bit(X86_FEATURE_PSE36, regs[3]);
+clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
 }
 break;
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] flask/policy: Updates for example XSM policy

2014-11-28 Thread Ian Campbell
On Wed, 2014-09-24 at 10:14 +0100, Ian Campbell wrote:
> On Tue, 2014-09-23 at 16:40 -0400, Daniel De Graaf wrote:
> > On 09/23/2014 11:30 AM, Ian Campbell wrote:
> > > On Tue, 2014-09-23 at 10:37 +0100, Wei Liu wrote:
> > >> On Tue, Sep 23, 2014 at 10:01:48AM +0100, Wei Liu wrote:
> > >>> On Mon, Sep 22, 2014 at 04:23:18PM -0400, Daniel De Graaf wrote:
> >  The example XSM policy was missing permission for dom0_t to migrate
> >  domains with label domU_t; add these permissions.
> > >
> > > Daniel, would you prefer to iterate until a full batch of fixes or shall
> > > I apply and expect "More updates for example XSM policy" later on?
> > >
> > 
> > I would prefer to iterate and apply the full set.
> 
> Ack.

I've just spotted this in my queue, did this full set ever happen? I
don't see it in tree of in my queue folder. Maybe the issue went away
some other way?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] pygrub: Fix regression from c/s d1b93ea, attempt 2

2014-11-28 Thread Ian Campbell
On Tue, 2014-11-25 at 11:11 -0500, Boris Ostrovsky wrote:
> c/s d1b93ea causes substantial functional regressions in pygrub's ability to
> parse bootloader configuration files.
> 
> c/s d1b93ea itself changed an an interface which previously used exclusively
> integers, to using strings in the case of a grub configuration with explicit
> default set, along with changing the code calling the interface to require a
> string.  The default value for "default" remained as an integer.
> 
> As a result, any Extlinux or Lilo configuration (which drives this interface
> exclusively with integers), or Grub configuration which doesn't explicitly
> declare a default will die with an AttributeError when attempting to call
> "self.cf.default.isdigit()" where "default" is an integer.
> 
> Sadly, this AttributeError gets swallowed by the blanket ignore in the loop
> which searches partitions for valid bootloader configurations, causing the
> issue to be reported as "Unable to find partition containing kernel"
> 
> We should explicitly check type of "default" in image_index() and process it
> appropriately.
> 
> Reported-by: Andrew Cooper 
> Signed-off-by: Boris Ostrovsky 

Acked-by: Ian Campbell 

In general I would prefer the first line of the commit message to be a
short description of the actual functional change and not a reference to
fixing some other commit, which is basically meaningless to anyone
reading the log even now, nevermind in six months. I think I'm going to
start picking up on this more frequently from now on.

CCing Konrad for RM input. I'm much less worried about corner cases etc
wrt the freeze etc than I was with the larger fix proposed earlier.

> ---
> 
> Commit message is Andrew's with exception of the last sentense.
> 
> I only tested this with grub2.
> 
> -boris
> 
>  tools/pygrub/src/pygrub |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>  mode change 100644 => 100755 tools/pygrub/src/pygrub
> 
> diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
> old mode 100644
> new mode 100755
> index aa7e562..3ec52fd
> --- a/tools/pygrub/src/pygrub
> +++ b/tools/pygrub/src/pygrub
> @@ -457,7 +457,9 @@ class Grub:
>  self.cf.parse(buf)
>  
>  def image_index(self):
> -if self.cf.default.isdigit():
> +if isinstance(self.cf.default, int):
> +sel = self.cf.default
> +elif self.cf.default.isdigit():
>  sel = int(self.cf.default)
>  else:
>  # We don't fully support submenus. Look for the leaf value in



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 11:52,  wrote:
> If hardware support the 1GB pages, expose the feature to guest by
> default. Users don't have to use a 'cpuid= ' option in config fil
> e to turn it on.
> 
> If guest use shadow mode, the 1GB pages feature will be hidden from
> guest, this is done in the function hvm_cpuid(). So the change is
> okay for shadow mode case.
> 
> Signed-off-by: Liang Li 
> Signed-off-by: Yang Zhang 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 3/3] python/xs: Correct the indirection of the NULL xshandle() check

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
> The code now now matches its comment, and will actually catch the case of a
> bad xs handle.
> 
> Signed-off-by: Andrew Cooper 
> Coverity-ID: 1055948
> CC: Ian Campbell 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Xen Coverity Team 

Acked-by: Ian Campbell 

> ---
>  tools/python/xen/lowlevel/xs/xs.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/python/xen/lowlevel/xs/xs.c 
> b/tools/python/xen/lowlevel/xs/xs.c
> index 84e1711..ec364bb 100644
> --- a/tools/python/xen/lowlevel/xs/xs.c
> +++ b/tools/python/xen/lowlevel/xs/xs.c
> @@ -816,7 +816,7 @@ static int parse_transaction_path(XsHandle *self, 
> PyObject *args,
>  
>  *xh = xshandle(self);
>  
> -if (!xh)
> +if (!*xh)
>  return 0;
>  
>  if (!PyArg_ParseTuple(args, "ss", &thstr, path))



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Andrew Cooper
Hi,

I have finally worked out why XenServer is having issues with its legacy
windows drivers and Xen-4.5.

XenServer needs to support hvm ops with an indicies of 0x102 and 0x103
to prevent our legacy windows drivers from BSODing on boot.  (These
problems will disappear when we can drop Windows XP/Server 2k3 support,
but that time is not now.)

The root cause of the breakage is c/s e8b87b57b "x86/HVM: fix preemption
handling in do_hvm_op() (try 2)", and in particular the HVMOP_op_mask,
which turns the above hypercalls into HVMOP_set_{pci_intx,isa_irq}_level
hypercalls.


>From my point of view, I can work around this with the knowledge that
HVMOP_set_{pci_intx,isa_irq}_level hypercalls are not eligible for
pre-emption.

However, it occurs to me that HVMOP_op_mask absolutely must live in the
public header files, as it is guest visible, and forms part of the ABI.

Consider a userspace task which falls into a continuation, is preempted
and paused by the guest kernel, the entire VM migrated, and the task
unpaused later.  If HVMOP_op_mask has changed in that time, the
continuation will become erroneous.

In particular, all hypercall continuation strategies we use *must* be
forward compatible when moving to newer versions of Xen, because Xen
cannot possibly guarantee that a continuation which starts will finish
on the same hypervisor.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xenstore.h clarifications

2014-11-28 Thread Ian Jackson
Ian Campbell writes ("Re: [Xen-devel] Xenstore.h clarifications"):
> I think there's a pretty good chance that the same applies to xenstore
> connections made over the device/shared-ring interface.

Yes.

> I'm not really sure about the semantics of a Unix domain socket after a
> fork, but I don't expect both parent and child could sanely make use of
> it.

>From the kernel's point of view everything is fine but of course the
protocol running through the socket would be quite messed up.  For
xenstore, while in theory you could take turns somehow, I don't think
suggesting that is sensible.

> So I think the answer is:
> 
>   * Connections made with xs_open(0) (which might be shared page or
> socket based) are only guaranteed to work in the parent after
> fork.
>   * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable
> in either the parent or the child after fork, but not both.
>   * xs_daemon_open*() and xs_domain_open() are deprecated synonyms
> for xs_open(0)
>   * XS_OPEN_READONLY has not bearing on any of this.
> 
> Ian, does that seem right?

Exactly, yes.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
> The error handling from a failed memory allocation should return
> PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and continuing
> to the memcpy() below, with the dest pointer being NULL.
> 
> Furthermore, the context string is simply an input parameter to the hypercall,
> and is not mutated anywhere along the way.  The error handling elsewhere in
> the function can be simplified by not duplicating it to start with.
> 
> Signed-off-by: Andrew Cooper 
> Coverity-IDs: 1055305 1055721
> CC: Ian Campbell 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Xen Coverity Team 

Acked-by: Ian Campbell 

This would have been far more obviously correct for 4.5 if you had stuck
to fixing the issue in the first paragraph.

> ---
>  tools/python/xen/lowlevel/xc/xc.c |   21 +++--
>  1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/python/xen/lowlevel/xc/xc.c 
> b/tools/python/xen/lowlevel/xc/xc.c
> index d95d459..c70b388 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -2126,8 +2126,6 @@ static PyObject *pyflask_context_to_sid(PyObject *self, 
> PyObject *args,
>  {
>  xc_interface *xc_handle;
>  char *ctx;
> -char *buf;
> -uint32_t len;
>  uint32_t sid;
>  int ret;
>  
> @@ -2137,28 +2135,15 @@ static PyObject *pyflask_context_to_sid(PyObject 
> *self, PyObject *args,
>&ctx) )
>  return NULL;
>  
> -len = strlen(ctx);
> -
> -buf = malloc(len);
> -if (!buf) {
> -errno = -ENOMEM;
> -PyErr_SetFromErrno(xc_error_obj);
> -}
> -
> -memcpy(buf, ctx, len);
> -
>  xc_handle = xc_interface_open(0,0,0);
>  if (!xc_handle) {
> -free(buf);
>  return PyErr_SetFromErrno(xc_error_obj);
>  }
> -
> -ret = xc_flask_context_to_sid(xc_handle, buf, len, &sid);
> -
> +
> +ret = xc_flask_context_to_sid(xc_handle, ctx, strlen(ctx), &sid);
> +
>  xc_interface_close(xc_handle);
>  
> -free(buf);
> -
>  if ( ret != 0 ) {
>  errno = -ret;
>  return PyErr_SetFromErrno(xc_error_obj);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 2/3] python/xc: Fix multiple issues in pyxc_readconsolering()

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
> Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first
> xc_readconsolering() fail.  It is trivial to run throught the processes memory
> by repeatedly passing junk parameters to this function.
> 
> In the case that the call to xc_readconsolering() in the while loop fails,
> reinstate str before breaking out, and passing a spurious pointer to free().
> 
> Signed-off-by: Andrew Cooper 
> Coverity-IDs: 1054984 1055906
> CC: Ian Campbell 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Xen Coverity Team 

Acked-by: Ian Campbell 

> ---
>  tools/python/xen/lowlevel/xc/xc.c |   13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/python/xen/lowlevel/xc/xc.c 
> b/tools/python/xen/lowlevel/xc/xc.c
> index c70b388..2aa0dc7 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1089,7 +1089,7 @@ static PyObject *pyxc_readconsolering(XcObject *self,
>  {
>  unsigned int clear = 0, index = 0, incremental = 0;
>  unsigned int count = 16384 + 1, size = count;
> -char*str = malloc(size), *ptr;
> +char*str, *ptr;
>  PyObject*obj;
>  int  ret;
>  
> @@ -1097,15 +1097,17 @@ static PyObject *pyxc_readconsolering(XcObject *self,
>  
>  if ( !PyArg_ParseTupleAndKeywords(args, kwds, "|iii", kwd_list,
>&clear, &index, &incremental) ||
> - !str )
> + !(str = malloc(size)) )
>  return NULL;
>  
>  ret = xc_readconsolering(self->xc_handle, str, &count, clear,
>   incremental, &index);
> -if ( ret < 0 )
> +if ( ret < 0 ) {
> +free(str);
>  return pyxc_error_to_exception(self->xc_handle);
> +}
>  
> -while ( !incremental && count == size )
> +while ( !incremental && count == size && ret >= 0 )
>  {
>  size += count - 1;
>  if ( size < count )
> @@ -1119,9 +1121,6 @@ static PyObject *pyxc_readconsolering(XcObject *self,
>  count = size - count;
>  ret = xc_readconsolering(self->xc_handle, str, &count, clear,
>   1, &index);
> -if ( ret < 0 )
> -break;
> -
>  count += str - ptr;
>  str = ptr;
>  }



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()

2014-11-28 Thread Andrew Cooper
On 28/11/14 11:37, Ian Campbell wrote:
> On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
>> The error handling from a failed memory allocation should return
>> PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and 
>> continuing
>> to the memcpy() below, with the dest pointer being NULL.
>>
>> Furthermore, the context string is simply an input parameter to the 
>> hypercall,
>> and is not mutated anywhere along the way.  The error handling elsewhere in
>> the function can be simplified by not duplicating it to start with.
>>
>> Signed-off-by: Andrew Cooper 
>> Coverity-IDs: 1055305 1055721
>> CC: Ian Campbell 
>> CC: Ian Jackson 
>> CC: Wei Liu 
>> CC: Xen Coverity Team 
> Acked-by: Ian Campbell 
>
> This would have been far more obviously correct for 4.5 if you had stuck
> to fixing the issue in the first paragraph.

Hmm - I appear to have missed a detail in the description.  One of the
CIDs is to do with putting a string in a new buffer without a NUL
terminator, and passing it as a char* into xc_flask_context_to_sid; the
issue being that anyone expecting things like strlen() to work will be
in for a nasty shock.

One solution to this was strdup(), but as the duplication was
unnecessary anyway, it was easier just to drop it all.

~Andrew

>
>> ---
>>  tools/python/xen/lowlevel/xc/xc.c |   21 +++--
>>  1 file changed, 3 insertions(+), 18 deletions(-)
>>
>> diff --git a/tools/python/xen/lowlevel/xc/xc.c 
>> b/tools/python/xen/lowlevel/xc/xc.c
>> index d95d459..c70b388 100644
>> --- a/tools/python/xen/lowlevel/xc/xc.c
>> +++ b/tools/python/xen/lowlevel/xc/xc.c
>> @@ -2126,8 +2126,6 @@ static PyObject *pyflask_context_to_sid(PyObject 
>> *self, PyObject *args,
>>  {
>>  xc_interface *xc_handle;
>>  char *ctx;
>> -char *buf;
>> -uint32_t len;
>>  uint32_t sid;
>>  int ret;
>>  
>> @@ -2137,28 +2135,15 @@ static PyObject *pyflask_context_to_sid(PyObject 
>> *self, PyObject *args,
>>&ctx) )
>>  return NULL;
>>  
>> -len = strlen(ctx);
>> -
>> -buf = malloc(len);
>> -if (!buf) {
>> -errno = -ENOMEM;
>> -PyErr_SetFromErrno(xc_error_obj);
>> -}
>> -
>> -memcpy(buf, ctx, len);
>> -
>>  xc_handle = xc_interface_open(0,0,0);
>>  if (!xc_handle) {
>> -free(buf);
>>  return PyErr_SetFromErrno(xc_error_obj);
>>  }
>> -
>> -ret = xc_flask_context_to_sid(xc_handle, buf, len, &sid);
>> -
>> +
>> +ret = xc_flask_context_to_sid(xc_handle, ctx, strlen(ctx), &sid);
>> +
>>  xc_interface_close(xc_handle);
>>  
>> -free(buf);
>> -
>>  if ( ret != 0 ) {
>>  errno = -ret;
>>  return PyErr_SetFromErrno(xc_error_obj);
>



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote:
> state at the GIC level. This would also avoid masking the output signal
> and requires specific handling in the guest OS.

"which requires"?

It doesn't seem quite right to me otherwise, since context switching the
virq state *removes* the need to have the guest do anything other than
what it would do on native.

Assuming this is what you meant I propose (fixing some grammar etc as I
go):

xen/arm: Handle platforms with edge-triggered virtual timer

Some platforms (such as the ARMv8 model) use an edge-triggered interrupt
for the virtual timer. Even if the timer output signal is masked in the
context switch, the GIC will keep track that of any interrupts raised
while IRQs are disabled. As soon as IRQs are re-enabled, the virtual
interrupt timer will be injected to Xen.

If an idle vVCPU was scheduled next then the interrupt handler doesn't
expect to the receive the IRQ and will crash:

(XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC)
(XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR)
(XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0
(XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54
(XEN)[<00247010>] do_IRQ+0x1a4/0x220
(XEN)[<00244864>] gic_interrupt+0x50/0xec
(XEN)[<0024fbac>] do_trap_irq+0x20/0x2c
(XEN)[<00255240>] hyp_irq+0x5c/0x60
(XEN)[<00241084>] context_switch+0xb8/0xc4
(XEN)[<0022482c>] schedule+0x684/0x6d0
(XEN)[<0022785c>] __do_softirq+0xcc/0xe8
(XEN)[<002278d4>] do_softirq+0x14/0x1c
(XEN)[<00240fac>] idle_loop+0x134/0x154
(XEN)[<0024c160>] start_secondary+0x14c/0x15c
(XEN)[<0001>] 0001

The proper solution is to context switch the virtual interrupt state at
the GIC level. This would also avoid masking the output signal which
requires specific handling in the guest OS and more complex code in Xen
to deal with EOIs, and so is desirable for that reason too. 

Sadly, this solution requires some refactoring which would not be
suitable for a freeze exception for the Xen 4.5 release.

For now implement a temporary solution which ignores the virtual timer
interrupt when the idle VCPU is running.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 08:32 +, M A Young wrote:
> On Fri, 28 Nov 2014, Ian Campbell wrote:
> 
> > On Fri, 2014-11-28 at 00:28 +, M A Young wrote:
> >> Migrations with xl migrate --debug will fail because debugging information
> >> from the receiving process is written to the stdout channel. This channel
> >> is also used for status messages so the migration will fail as the sending
> >> process receives an unexpected message. This patch moves the debugging
> >> information to the stderr channel.
> >>
> >> Version 2 moves some output back to stdout that was accidentally moved
> >> to stderr in the first version.
> >>
> >> Signed-off-by: Michael Young 
> >
> > I think you've forgotten the attachment.
> 
> Yes, I did. Here it is.

Acked-by: Ian Campbell 

It's pretty big but a large chunk is pretty mechanical. Were you
intending this for 4.5? (Konrad CCd).

> Use stderr for debugging messages for xl migrate --debug as stdout is used
> for status messages.
> 
> --- xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c.orig 2014-10-24 15:22:40.0 
> +0100
> +++ xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c  2014-11-26 22:41:41.697043321 
> +
> @@ -380,10 +380,10 @@
>  }
>  static void printf_info(enum output_format output_format,
>  int domid,
> -libxl_domain_config *d_config)
> +libxl_domain_config *d_config, FILE *fh)
>  {
>  if (output_format == OUTPUT_FORMAT_SXP)
> -return printf_info_sexp(domid, d_config);
> +return printf_info_sexp(domid, d_config, fh);
>  
>  const char *buf;
>  libxl_yajl_length len = 0;
> @@ -404,7 +404,7 @@
>  if (s != yajl_gen_status_ok)
>  goto out;
>  
> -puts(buf);
> +fputs(buf, fh);
>  
>  out:
>  yajl_gen_free(hand);
> @@ -413,7 +413,13 @@
>  fprintf(stderr,
>  "unable to format domain config as JSON (YAJL:%d)\n", s);
>  
> -if (ferror(stdout) || fflush(stdout)) { perror("stdout"); exit(-1); }
> +if (ferror(fh) || fflush(fh)) {
> +if (fh == stdout)
> +perror("stdout");
> +else
> +perror("stderr");
> +exit(-1);
> +}
>  }
>  
>  static int do_daemonize(char *name)
> @@ -2423,7 +2429,7 @@
>  }
>  
>  if (!dom_info->quiet)
> -printf("Parsing config from %s\n", config_source);
> +fprintf(stderr, "Parsing config from %s\n", config_source);
>  
>  if (config_in_json) {
>  libxl_domain_config_from_json(ctx, &d_config,
> @@ -2451,7 +2457,7 @@
>  }
>  
>  if (debug || dom_info->dryrun)
> -printf_info(default_output_format, -1, &d_config);
> +printf_info(default_output_format, -1, &d_config, stderr);
>  
>  ret = 0;
>  if (dom_info->dryrun)
> @@ -3403,7 +3409,7 @@
>  if (default_output_format == OUTPUT_FORMAT_JSON)
>  s = printf_info_one_json(hand, info[i].domid, &d_config);
>  else
> -printf_info_sexp(info[i].domid, &d_config);
> +printf_info_sexp(info[i].domid, &d_config, stdout);
>  libxl_domain_config_dispose(&d_config);
>  if (s != yajl_gen_status_ok)
>  goto out;
> @@ -4725,7 +4731,7 @@
>  parse_config_data(filename, config_data, config_len, &d_config);
>  
>  if (debug || dryrun_only)
> -printf_info(default_output_format, -1, &d_config);
> +printf_info(default_output_format, -1, &d_config, stdout);
>  
>  if (!dryrun_only) {
>  fprintf(stderr, "setting dom%d configuration\n", domid);
> --- xen-4.5.0-rc1/tools/libxl/xl_sxp.c.orig 2014-10-24 15:22:40.0 
> +0100
> +++ xen-4.5.0-rc1/tools/libxl/xl_sxp.c  2014-11-26 22:30:58.416394082 +
> @@ -30,7 +30,7 @@
>  /* In general you should not add new output to this function since it
>   * is intended only for legacy use.
>   */
> -void printf_info_sexp(int domid, libxl_domain_config *d_config)
> +void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh)
>  {
>  int i;
>  libxl_dominfo info;
> @@ -38,195 +38,195 @@
>  libxl_domain_create_info *c_info = &d_config->c_info;
>  libxl_domain_build_info *b_info = &d_config->b_info;
>  
> -printf("(domain\n\t(domid %d)\n", domid);
> -printf("\t(create_info)\n");
> -printf("\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM);
> -printf("\t(hap %s)\n", libxl_defbool_to_string(c_info->hap));
> -printf("\t(oos %s)\n", libxl_defbool_to_string(c_info->oos));
> -printf("\t(ssidref %d)\n", c_info->ssidref);
> -printf("\t(name %s)\n", c_info->name);
> +fprintf(fh, "(domain\n\t(domid %d)\n", domid);
> +fprintf(fh, "\t(create_info)\n");
> +fprintf(fh, "\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM);
> +fprintf(fh, "\t(hap %s)\n", libxl_defbool_to_string(c_info->hap));
> +fprintf(fh, "\t(oos %s)\n", libxl_defbool_to_string(c_info->oos));
> +fprintf(fh, "\t(ssidref %d)\n", c_info->ssidref);
> +fprintf(fh, "\t(name %s

Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 18:52 +0800, Liang Li wrote:
> If hardware support the 1GB pages, expose the feature to guest by
> default. Users don't have to use a 'cpuid= ' option in config fil
> e to turn it on.
> 
> If guest use shadow mode, the 1GB pages feature will be hidden from
> guest, this is done in the function hvm_cpuid(). So the change is
> okay for shadow mode case.
> 
> Signed-off-by: Liang Li 
> Signed-off-by: Yang Zhang 

FTR although this is strictly speaking a toolstack patch I think the
main ack required should be from the x86 hypervisor guys...

> ---
>  tools/libxc/xc_cpuid_x86.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index a18b1ff..c97f91a 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -109,6 +109,7 @@ static void amd_xc_cpuid_policy(
>  regs[3] &= (0x0183f3ff | /* features shared with 0x0001:EDX */
>  bitmaskof(X86_FEATURE_NX) |
>  bitmaskof(X86_FEATURE_LM) |
> +bitmaskof(X86_FEATURE_PAGE1GB) |
>  bitmaskof(X86_FEATURE_SYSCALL) |
>  bitmaskof(X86_FEATURE_MP) |
>  bitmaskof(X86_FEATURE_MMXEXT) |
> @@ -192,6 +193,7 @@ static void intel_xc_cpuid_policy(
>  bitmaskof(X86_FEATURE_ABM));
>  regs[3] &= (bitmaskof(X86_FEATURE_NX) |
>  bitmaskof(X86_FEATURE_LM) |
> +bitmaskof(X86_FEATURE_PAGE1GB) |
>  bitmaskof(X86_FEATURE_SYSCALL) |
>  bitmaskof(X86_FEATURE_RDTSCP));
>  break;
> @@ -386,6 +388,7 @@ static void xc_cpuid_hvm_policy(
>  clear_bit(X86_FEATURE_LM, regs[3]);
>  clear_bit(X86_FEATURE_NX, regs[3]);
>  clear_bit(X86_FEATURE_PSE36, regs[3]);
> +clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
>  }
>  break;
>  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass

2014-11-28 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions 
- trouble: blocked/broken/fail/pass"):
> On 28.11.14 at 10:07,  wrote:
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 
> > 31781
> 
> Wasn't the swiotlb problem supposed to be dealt with by now?
> swiotlb=65536 looks to be in place here, yet that still appears to
> not be big enough...

That was just an attempted workaround, which we knew might or might
not work.  David Vrabel posted some patches to lmkl which are supposed
to properly fix it.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug

2014-11-28 Thread Ian Jackson
Ian Campbell writes ("Re: [PATCH v2] fix migration failure with xl migrate 
--debug"):
> Acked-by: Ian Campbell 

Thanks for reviewing it :-).

> It's pretty big but a large chunk is pretty mechanical. Were you
> intending this for 4.5? (Konrad CCd).

I think (based on reading the thread but not the code) that this ought
to be in 4.5.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] missing chunk of HVM direct kernel boot patch

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 13:55 +0800, Chunyan Liu wrote:
> Found by Stefano, this chunk of the patch was never applied to
> xen-unstable (commit 11dffa2359e8a2629490c14c029c7c7c777b3e47),
> see http://marc.info/?l=qemu-devel&m=140471493425353&w=2.

How strange, "git am" usually makes it pretty difficult to miss hunks.
Sorry about this.

> Signed-off-by: Chunyan Liu 

Acked-by: Ian Campbell 

I'm afraid that despite the circumstances this still needs a release ack
from Konrad. Obviously the upside is fixing a partially implemented
feature, but I'm not sure what the downsides are.

Has this been tested with stubdoms, including when this feature is not
used? My biggest concern is that because this function is also used to
build the command line for the stubdom and the stubdom is PV and hence
has at least a ->kernel setting then this new code might break that use
case, by adding these options when they are not wanted. This path is all
a bit tangled so I'm not 100% sure if those fields are actually set or
not.

> ---
>  tools/libxl/libxl_dm.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3e191c3..b25b574 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -527,6 +527,15 @@ static char ** 
> libxl__build_device_model_args_new(libxl__gc *gc,
>  if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
>  int ioemu_nics = 0;
>  
> +if (b_info->kernel)
> +flexarray_vappend(dm_args, "-kernel", b_info->kernel, NULL);
> +
> +if (b_info->ramdisk)
> +flexarray_vappend(dm_args, "-initrd", b_info->ramdisk, NULL);
> +
> +if (b_info->cmdline)
> +flexarray_vappend(dm_args, "-append", b_info->cmdline, NULL);
> +
>  if (b_info->u.hvm.serial || b_info->u.hvm.serial_list) {
>  if ( b_info->u.hvm.serial && b_info->u.hvm.serial_list )
>  {



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 0/3] Coverity fixes for python lowlevel libraries

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:

> While Xend is certainly dead and gone, XenServer at the very least still has
> consumers of these python libraries.

AIUI XS mainly uses the xs.c stuff, with the xc.c stuff being mainly
debug utilities.

This is important because while the xs.c is simple and obviously correct
the xc.c patches are a little more involved.

> 
> Konrad: I am requesting a release ack for this.  All 5 issues are bugs with
> the handling of error cases, rather than with the basic functionality
> provided.  With these changes, Coverity is of the opinion that the python
> libraries are perfect (0 issues), and I feel this is a worthy position to be
> in for 4.5
> 
> Andrew Cooper (3):
>   python/xc: Fix multiple issues in pyflask_context_to_sid()
>   python/xc: Fix multiple issues in pyxc_readconsolering()
>   python/xs: Correct the indirection of the NULL xshandle() check
> 
>  tools/python/xen/lowlevel/xc/xc.c |   34 +-
>  tools/python/xen/lowlevel/xs/xs.c |2 +-
>  2 files changed, 10 insertions(+), 26 deletions(-)
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] fix segfault in xl migrate --debug

2014-11-28 Thread Ian Campbell
On Wed, 2014-11-26 at 21:19 +, Andrew Cooper wrote:
> On 26/11/2014 19:54, M A Young wrote:
> 
> > If differences are found during the verification phase of xl migrate
> > --debug then it is likely to crash with a segfault because the
> > bogus 
> > pagebuf->pfn_types[pfn] is used in a print statement instead of
> > pfn_type[pfn] . 
> > 
> > Signed-off-by: Michael Young 
> > 
> > 
> > 
> 
> Reviewed-by: Andrew Cooper 

Acked-by: Ian Campbell 

Needs a release ack if this is to be for 4.5, Konrad CCd.

On the one hand this fixes an issue which is only present if you enable
debug/verify mode, so it's not that critical. On the other hand it only
touches code which is used if you enable debug/verify mode, so it's not
that risky.

I'm inclined towards the apply it for 4.5 end of the scale...

> 
> > xl migrate --debug can segfault because pagebuf->pfn_types[pfn] is
> > used in a print statement instead of pfn_type[pfn] 
> > 
> > --- xen-4.5.0-rc1/tools/libxc/xc_domain_restore.c.orig  2014-10-24 
> > 15:22:40.0 +0100
> > +++ xen-4.5.0-rc1/tools/libxc/xc_domain_restore.c   2014-11-25 
> > 21:01:16.604081467 +
> > @@ -1404,7 +1404,7 @@
> >  int v;
> >  
> >  DPRINTF("** pfn=%lx type=%lx gotcs=%08lx "
> > -"actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
> > +"actualcs=%08lx\n", pfn, pfn_type[pfn],
> >  csum_page(region_base + i * PAGE_SIZE),
> >  csum_page(buf));
> >  
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-xen-4.5] libxl: Allow copying smaller bitmap into a larger one

2014-11-28 Thread Ian Campbell
On Tue, 2014-11-25 at 10:54 -0500, Konrad Rzeszutek Wilk wrote:
> > >>Reported-by: Boris Ostrovsky 
> > >>Signed-off-by: Wei Liu 
> > >>Cc: Ian Campbell 
> > >>Cc: Ian Jackson 
> > >>Cc: Dario Faggioli 
> > >>
> > >If this end up being the approach, it can have the following:
> > >
> > >Reviewed-by: Dario Faggioli 
> > 
> > Tested-by: Boris Ostrovsky 
> 
> Release-Acked-by: Konrad Rzeszutek Wilk 

Acked +Applied.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] INSTALL: correct EXTRA_CFLAGS handling

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 10:17 -0500, Konrad Rzeszutek Wilk wrote:
> On November 27, 2014 4:26:26 AM EST, Olaf Hering  wrote:
> >The already documented configure patch was not applied.
> >Adjust documentation to describe existing behaviour.
> >
> >Signed-off-by: Olaf Hering 
> >Cc: Ian Campbell 
> >Cc: Ian Jackson 
> >Cc: Konrad Rzeszutek Wilk 
> 
> Reviewed-by: me.
> 
> Don't need an release ack for it.

Applied.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] tools/oxenstored: Fix | vs & error in fd event handling

2014-11-28 Thread Ian Campbell
On Wed, 2014-11-26 at 16:09 -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 26, 2014 at 08:44:41PM +, Dave Scott wrote:
> > 
> > > On 26 Nov 2014, at 18:41, Konrad Rzeszutek Wilk  
> > > wrote:
> > > 
> > > On Wed, Nov 26, 2014 at 06:24:11PM +, Dave Scott wrote:
> > >> 
> > >>> On 26 Nov 2014, at 15:38, Zheng Li  wrote:
> > >>> 
> > >>> On 26/11/2014 15:09, Andrew Cooper wrote:
> >  This makes fields 0 and 1 true more often than they should be, 
> >  resulting
> >  problems when handling events.
> > >>> 
> > >>> Indeed, looks like a mistake I made when rewriting the logic terms 
> > >>> lately. The result is POLLUP or POLLERR events being returned in more 
> > >>> categories than we'd interest. Thanks for fixing this!
> > >>> 
> > >>> Acked-by: Zheng Li 
> > >> 
> > >> This also looks fine to me
> > >> 
> > >> Acked-by: David Scott 
> > > 
> > > Would it be possible to get an Reviewed-by please?
> > 
> > I’ll certainly offer
> > 
> > Reviewed-by: David Scott 
> 
> OK, Release-Acked-by: Konrad Rzeszutek Wilk 

Applied, thanks.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated

2014-11-28 Thread Ian Campbell
On Tue, 2014-11-25 at 11:03 -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 25, 2014 at 03:58:33PM +, Ian Campbell wrote:
> > On Tue, 2014-11-25 at 10:51 -0500, Konrad Rzeszutek Wilk wrote:
> > > Right, so with the reassurance text added on:
> > > Release-Acked-by: Konrad Rzeszutek Wilk 
> > 
> > Thanks.
> > 
> > Chunyan, I propose to commit adding the following to the commit text of
> > "[PATCH 1/2 V3] remove domain field in xenstore backend dir":
> > 
> > The correct way to obtain a domain's name is via
> > libxl_domid_to_name(), or by reading
> > from /local/domain/$DOMID/name for toolstacks not using libxl.
> > 
> > Does that sound OK to you?
> 
> Yes.

Thanks, that was really a question for Chunyan, but rather than wait any
longer I've applied with that change.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-20 at 14:44 -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 20, 2014 at 03:48:17PM +, Wei Liu wrote:
> > On Thu, Nov 20, 2014 at 03:29:51PM +, Ian Campbell wrote:
> > > On Wed, 2014-11-19 at 21:24 +, Wei Liu wrote:
> > > > On Wed, Nov 19, 2014 at 04:08:46PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
> > > > > > Before this patch, pv guest video_memkb is -1, which is an invalid 
> > > > > > value.
> > > > > > And it will cause the xenstore 'memory/targe' calculation wrong:
> > > > > > 
> > > > > > memory/target = info->target_memkb - info->video_memkb
> > > > > 
> > > > > CC-ing the maintainers.
> > > > > 
> > > > > Is this an regression as compared to Xen 4.4 or is this also in Xen 
> > > > > 4.4?
> > > > > 
> > > > 
> > > > I don't think this is a regression, it has been broken for quite a
> > > > while.
> > > 
> > > I think so to.
> > > 
> > > On the other hand its a pretty clear bug to use video_memkb == -1 and
> > > we've seen that it causes real issues. The fix is also fairly obvious.
> > > I'm inclined towards suggesting we fix this in 4.5.
> > > 
> > 
> > I concur.
> 
> Lets do it then. RElease-Acked-by: Konrad Rzeszutek Wilk 
> 

APplied, thanks.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 11:47 +, Andrew Cooper wrote:
> On 28/11/14 11:37, Ian Campbell wrote:
> > On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
> >> The error handling from a failed memory allocation should return
> >> PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and 
> >> continuing
> >> to the memcpy() below, with the dest pointer being NULL.
> >>
> >> Furthermore, the context string is simply an input parameter to the 
> >> hypercall,
> >> and is not mutated anywhere along the way.  The error handling elsewhere in
> >> the function can be simplified by not duplicating it to start with.
> >>
> >> Signed-off-by: Andrew Cooper 
> >> Coverity-IDs: 1055305 1055721
> >> CC: Ian Campbell 
> >> CC: Ian Jackson 
> >> CC: Wei Liu 
> >> CC: Xen Coverity Team 
> > Acked-by: Ian Campbell 
> >
> > This would have been far more obviously correct for 4.5 if you had stuck
> > to fixing the issue in the first paragraph.
> 
> Hmm - I appear to have missed a detail in the description.  One of the
> CIDs is to do with putting a string in a new buffer without a NUL
> terminator, and passing it as a char* into xc_flask_context_to_sid; the
> issue being that anyone expecting things like strlen() to work will be
> in for a nasty shock.

ISTR a discussion of this interface in Julien's device-tree passthrough
thing a while back and some sort of conclusion that the hypervisor was
supposed to use the len field and not rely on the NULL.

I'm not sure that necessarily invalidates what you are saying, since
even given that throwing a NULL on the end would be friendly to libxc
consumers if nothing else.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xenstore: Clarify xs_open() semantics

2014-11-28 Thread Razvan Cojocaru
Added to the xs_open() comments in xenstore.h. The text has been
taken almost verbatim from a xen-devel email by Ian Campbell,
and confirmed as accurate by Ian Jackson.

Signed-off-by: Razvan Cojocaru 
Suggested-off-by: Ian Campbell 
---
 tools/xenstore/include/xenstore.h |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/include/xenstore.h 
b/tools/xenstore/include/xenstore.h
index fdf5e76..b4b113e 100644
--- a/tools/xenstore/include/xenstore.h
+++ b/tools/xenstore/include/xenstore.h
@@ -59,10 +59,20 @@ typedef uint32_t xs_transaction_t;
 /* On failure, these routines set errno. */
 
 /* Open a connection to the xs daemon.
- * Attempts to make a connection over the socket interface, 
+ * Attempts to make a connection over the socket interface,
  * and if it fails, then over the  xenbus interface.
  * Mode 0 specifies read-write access, XS_OPEN_READONLY for
  * read-only access.
+ *
+ * * Connections made with xs_open(0) (which might be shared page or
+ *   socket based) are only guaranteed to work in the parent after
+ *   fork.
+ * * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable
+ *   in either the parent or the child after fork, but not both.
+ * * xs_daemon_open*() and xs_domain_open() are deprecated synonyms
+ *   for xs_open(0).
+ * * XS_OPEN_READONLY has no bearing on any of this.
+ *
  * Returns a handle or NULL.
  */
 struct xs_handle *xs_open(unsigned long flags);
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH 0/4] Introduction of the patches.

2014-11-28 Thread Ian Jackson
longtao.pang writes ("[OSSTEST PATCH 0/4] Introduction of the patches."):
> We updated these patchs about adding Nested test job into OSSTest.

Thanks for your contribution.

Having some testing of nested HVM would be good.

But I'm not convinced that these patches take the right approach to
achieving that.  There seems to be a great deal of duplication of
code.  I think we should have a conversation about what moving parts
are necessary for nested HVM testing.

I would guess that you could reuse ts-debian-hvm-install for the
initial install of the L1 guest, and then ts-xen-install to install
Xen on it.  Finally, you could run some other ts-* scripts to install
your L2 guests.  I think you would probably find that there are only
some small changes needed to make those existing scripts flexible
enough.

And I don't understand why you need to rebuild anything.  Surely the
already-built hypervisor and tools ought to work just as well in the
L1 guest ?

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xenstore: Clarify xs_open() semantics

2014-11-28 Thread Ian Jackson
Razvan Cojocaru writes ("[PATCH] xenstore: Clarify xs_open() semantics"):
> Added to the xs_open() comments in xenstore.h. The text has been
> taken almost verbatim from a xen-devel email by Ian Campbell,
> and confirmed as accurate by Ian Jackson.
> 
> Signed-off-by: Razvan Cojocaru 
> Suggested-off-by: Ian Campbell 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote:
> I propose to reword the commit message into:

You'll want to change the code comment too I think.

> 
> xen/arm: Handle platforms with edge-triggered virtual timer
> 
> Some platforms (such as the ARMv8 model) uses edge-triggered interrupt
> for the virtual timer. Even if the timer output signal is masked in the
> context switch, the GIC will keep track that of any raised interrupt
> while the IRQs has been disabled. As soon as the IRQs are re-enabled,
> the virtual interrupt timer will be injected to Xen.
> 
> The interrupt handler doesn't except to the receive the IRQ and will
> crash if an idle vCPU is running:
> 
> (XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC)
> (XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR)
> (XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0
> (XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54
> (XEN)[<00247010>] do_IRQ+0x1a4/0x220
> (XEN)[<00244864>] gic_interrupt+0x50/0xec
> (XEN)[<0024fbac>] do_trap_irq+0x20/0x2c
> (XEN)[<00255240>] hyp_irq+0x5c/0x60
> (XEN)[<00241084>] context_switch+0xb8/0xc4
> (XEN)[<0022482c>] schedule+0x684/0x6d0
> (XEN)[<0022785c>] __do_softirq+0xcc/0xe8
> (XEN)[<002278d4>] do_softirq+0x14/0x1c
> (XEN)[<00240fac>] idle_loop+0x134/0x154
> (XEN)[<0024c160>] start_secondary+0x14c/0x15c
> (XEN)[<0001>] 0001
> 
> The proper solution would be context/switching the virtual interrupt
> state at the GIC level. This would also avoid masking the output signal
> and requires specific handling in the guest OS.
> 
> Sadly, this solution require some refactoring which would miss the Xen
> 4.5 release.
> 
> For now implement a temporary solution which ignore the interrupt when
> the idle VCPU is running.
> 
> Regards,
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/6] libxl: events: Assert that libxl_ctx_free is not called from a hook

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> No-one in their right mind would do this, and if they did everything
> would definitely collapse.  Arrange that if this happens, we crash
> ASAP.
> 
> Signed-off-by: Ian Jackson 

Acked-by: Ian Campbell 

> ---
>  tools/libxl/libxl.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index de23fec..c111f27 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -148,6 +148,8 @@ int libxl_ctx_free(libxl_ctx *ctx)
>  {
>  if (!ctx) return 0;
>  
> +assert(!ctx->osevent_in_hook);
> +
>  int i;
>  GC_INIT(ctx);
>  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [seabios test] 31885: tolerable FAIL - PUSHED

2014-11-28 Thread xen . org
flight 31885 seabios real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31885/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf-pin  7 debian-install  fail pass in 31857
 test-amd64-amd64-xl-pcipt-intel  5 xen-bootfail in 31857 pass in 31885
 test-amd64-i386-pair  7 xen-boot/src_host  fail in 31857 pass in 31885

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31849

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass

version targeted for testing:
 seabios  b7f4a76a929ce4acd60e89aa273a8b208daa8233
baseline version:
 seabios  9f505f715793d99235bd6b4afb2ca7b96ba5729b


People who touched revisions under test:
  Gerd Hoffmann 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt fail
 test-amd64-i386-libvirt  fail
 test-amd64-i386-xl-multivcpu pass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair fail
 test-amd64-amd64-xl-sedf-pin fail
 test-amd64-amd64-xl-sedf   

Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> We want to have no fd events registered when we are idle.  This
> implies that we must be able to deregister our interest in the sigchld
> self-pipe fd, not just modify to request no events.
> 
> Signed-off-by: Ian Jackson 

Acked-by: Ian Campbell 

I think in principal there is now some redundant code in
libxl__sigchld_needed which calls modify to set POLLIN if the fd is not
registered. I think it's redundant because now the fd is registered iff
it is looking for POLLIN.

> ---
>  tools/libxl/libxl_fork.c |9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
> index fa15095..144208a 100644
> --- a/tools/libxl/libxl_fork.c
> +++ b/tools/libxl/libxl_fork.c
> @@ -372,15 +372,8 @@ static void sigchld_user_remove(libxl_ctx *ctx) /* 
> idempotent */
>  
>  void libxl__sigchld_notneeded(libxl__gc *gc) /* non-reentrant, idempotent */
>  {
> -int rc;
> -
>  sigchld_user_remove(CTX);
> -
> -if (libxl__ev_fd_isregistered(&CTX->sigchld_selfpipe_efd)) {
> -rc = libxl__ev_fd_modify(gc, &CTX->sigchld_selfpipe_efd, 0);
> -if (rc)
> -libxl__ev_fd_deregister(gc, &CTX->sigchld_selfpipe_efd);
> -}
> +libxl__ev_fd_deregister(gc, &CTX->sigchld_selfpipe_efd);
>  }
>  
>  int libxl__sigchld_needed(libxl__gc *gc) /* non-reentrant, idempotent */



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model

2014-11-28 Thread Julien Grall
Hi Ian,

On 28/11/14 12:32, Ian Campbell wrote:
> On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote:
>> I propose to reword the commit message into:
> 
> You'll want to change the code comment too I think.

It was my plan. I will send a new version during the day.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model

2014-11-28 Thread Julien Grall
Hi Ian,

On 28/11/14 11:47, Ian Campbell wrote:
> On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote:
>> state at the GIC level. This would also avoid masking the output signal
>> and requires specific handling in the guest OS.
> 
> "which requires"?
> 
> It doesn't seem quite right to me otherwise, since context switching the
> virq state *removes* the need to have the guest do anything other than
> what it would do on native.

I though the "avoid" would apply for both "masking" and "requires".

> Assuming this is what you meant I propose (fixing some grammar etc as I
> go):


Thanks for the correction, I will use this version. Shall I put your
signed-off-by?

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/6] libxl: events: Tear down SIGCHLD machinery on ctx destruction

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> We want to have no fd events registered when we are idle.
> Also, we should put back the default SIGCHLD handler.  So:
> 
>  * In libxl_ctx_free, use libxl_childproc_setmode to set the mode to
>the default, which is libxl_sigchld_owner_libxl (ie `libxl owns
>SIGCHLD only when it has active children').
> 
>But of course there are no active children at libxl teardown so
>this results in libxl__sigchld_notneeded: the ctx loses its
>interest in SIGCHLD (unsetting the SIGCHLD handler if we were the
>last ctx) and deregisters the per-ctx selfpipe fd.
> 
>  * assert that this is the case: ie that we are no longer interested
>in the selfpipe.
> 
> Signed-off-by: Ian Jackson 

Acked-by: Ian Campbell 

> ---
>  tools/libxl/libxl.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 12a1013..785253d 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -162,11 +162,12 @@ int libxl_ctx_free(libxl_ctx *ctx)
>  while ((eject = LIBXL_LIST_FIRST(&CTX->disk_eject_evgens)))
>  libxl__evdisable_disk_eject(gc, eject);
>  
> +libxl_childproc_setmode(CTX,0,0);
>  for (i = 0; i < ctx->watch_nslots; i++)
>  assert(!libxl__watch_slot_contents(gc, i));
>  assert(!libxl__ev_fd_isregistered(&ctx->watch_efd));
>  libxl__ev_fd_deregister(gc, &ctx->evtchn_efd);
> -libxl__ev_fd_deregister(gc, &ctx->sigchld_selfpipe_efd);
> +assert(!libxl__ev_fd_isregistered(&ctx->sigchld_selfpipe_efd));
>  
>  /* Now there should be no more events requested from the application: */
>  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 7/7] tools/tests: Remove superfluous and incomplete spinlock from xen-access

2014-11-28 Thread Razvan Cojocaru
On 11/12/2014 05:31 PM, Tamas K Lengyel wrote:
> The spin-lock implementation in the xen-access test program is implemented
> in a fashion that is actually incomplete. The x86 assembly that guarantees 
> that
> the lock is held by only one thread lacks the "lock;" instruction.
> 
> However, the spin-lock is not actually necessary in xen-access as it is not
> multithreaded. The presence of the faulty implementation of the lock in a non-
> mulithreaded environment is unnecessarily complicated for developers who are
> trying to follow this code as a guide in implementing their own applications.
> Thus, removing it from the code improves the clarity on the behavior of the
> system.
> 
> Signed-off-by: Tamas K Lengyel 
> ---
>  tools/tests/xen-access/xen-access.c | 68 
> -
>  1 file changed, 6 insertions(+), 62 deletions(-)
> 
> diff --git a/tools/tests/xen-access/xen-access.c 
> b/tools/tests/xen-access/xen-access.c
> index 3538323..428c459 100644
> --- a/tools/tests/xen-access/xen-access.c
> +++ b/tools/tests/xen-access/xen-access.c
> @@ -45,56 +45,6 @@
>  #define ERROR(a, b...) fprintf(stderr, a "\n", ## b)
>  #define PERROR(a, b...) fprintf(stderr, a ": %s\n", ## b, strerror(errno))
>  
> -/* Spinlock and mem event definitions */
> -
> -#define SPIN_LOCK_UNLOCKED 0
> -
> -#define ADDR (*(volatile long *) addr)
> -/**
> - * test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is atomic and cannot be reordered.
> - * It also implies a memory barrier.
> - */
> -static inline int test_and_set_bit(int nr, volatile void *addr)
> -{
> -int oldbit;
> -
> -asm volatile (
> -"btsl %2,%1\n\tsbbl %0,%0"
> -: "=r" (oldbit), "=m" (ADDR)
> -: "Ir" (nr), "m" (ADDR) : "memory");
> -return oldbit;
> -}
> -
> -typedef int spinlock_t;
> -
> -static inline void spin_lock(spinlock_t *lock)
> -{
> -while ( test_and_set_bit(1, lock) );
> -}
> -
> -static inline void spin_lock_init(spinlock_t *lock)
> -{
> -*lock = SPIN_LOCK_UNLOCKED;
> -}
> -
> -static inline void spin_unlock(spinlock_t *lock)
> -{
> -*lock = SPIN_LOCK_UNLOCKED;
> -}
> -
> -static inline int spin_trylock(spinlock_t *lock)
> -{
> -return !test_and_set_bit(1, lock);
> -}
> -
> -#define vm_event_ring_lock_init(_m)  spin_lock_init(&(_m)->ring_lock)
> -#define vm_event_ring_lock(_m)   spin_lock(&(_m)->ring_lock)
> -#define vm_event_ring_unlock(_m) spin_unlock(&(_m)->ring_lock)
> -
>  typedef struct vm_event {
>  domid_t domain_id;
>  xc_evtchn *xce_handle;
> @@ -102,7 +52,6 @@ typedef struct vm_event {
>  vm_event_back_ring_t back_ring;
>  uint32_t evtchn_port;
>  void *ring_page;
> -spinlock_t ring_lock;
>  } vm_event_t;
>  
>  typedef struct xenaccess {
> @@ -241,9 +190,6 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t 
> domain_id)
>  /* Set domain id */
>  xenaccess->vm_event.domain_id = domain_id;
>  
> -/* Initialise lock */
> -vm_event_ring_lock_init(&xenaccess->vm_event);
> -
>  /* Enable mem_access */
>  xenaccess->vm_event.ring_page =
>  xc_mem_access_enable(xenaccess->xc_handle,
> @@ -320,13 +266,14 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, 
> domid_t domain_id)
>  return NULL;
>  }
>  
> +/*
> + * Note that this function is not thread safe.
> + */
>  int get_request(vm_event_t *vm_event, vm_event_request_t *req)
>  {
>  vm_event_back_ring_t *back_ring;
>  RING_IDX req_cons;
>  
> -vm_event_ring_lock(vm_event);
> -
>  back_ring = &vm_event->back_ring;
>  req_cons = back_ring->req_cons;
>  
> @@ -338,18 +285,17 @@ int get_request(vm_event_t *vm_event, 
> vm_event_request_t *req)
>  back_ring->req_cons = req_cons;
>  back_ring->sring->req_event = req_cons + 1;
>  
> -vm_event_ring_unlock(vm_event);
> -
>  return 0;
>  }
>  
> +/*
> + * Note that this function is not thread safe.
> + */
>  static int put_response(vm_event_t *vm_event, vm_event_response_t *rsp)
>  {
>  vm_event_back_ring_t *back_ring;
>  RING_IDX rsp_prod;
>  
> -vm_event_ring_lock(vm_event);
> -
>  back_ring = &vm_event->back_ring;
>  rsp_prod = back_ring->rsp_prod_pvt;
>  
> @@ -361,8 +307,6 @@ static int put_response(vm_event_t *vm_event, 
> vm_event_response_t *rsp)
>  back_ring->rsp_prod_pvt = rsp_prod;
>  RING_PUSH_RESPONSES(back_ring);
>  
> -vm_event_ring_unlock(vm_event);
> -
>  return 0;
>  }

I've just now noticed that get_request() and put_response() only ever
return 0, so it makes no sense to check the return values later on, and
they should basically either be modified to be void functions or some
error checking should be added (not sure what that could be).


Regards,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] libxl: events: Document and enforce actual callbacks restriction

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> libxl_event_register_callbacks cannot reasonably be called while libxl
> is busy (has outstanding operations and/or enabled events).
> 
> This is because the previous spec implied (although not entirely
> clearly) that event hooks would not be called for existing fd and
> timeout interests.  There is thus no way to reliably ensure that libxl
> would get told about fds and timeouts which it became interested in
> beforehand.
> 
> So there have to be no such fds or timeouts, which means that the
> callbacks must only be registered or changed when the ctx is idle.
> 
> Document this restriction, and enforce it with a pair of asserts.
> 
> (It would be nicer, perhaps, to say that the application may not call
> libxl_osevent_register_hooks other than right after creating the ctx.
> But there are existing callers, including libvirt, who do it later -
> even after doing major operations such as domain creation.)
> 
> Signed-off-by: Ian Jackson 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> We want to have no fd events registered when we are idle.
> In this patch, deal with the evtchn fd:
> 
>  * Defer setup of the evtchn handle to the first use.
>  * Defer registration of the evtchn fd; register as needed on use.
>  * When cancelling an evtchn wait, or when wait setup fails, check
>whether there are now no evtchn waits and if so deregister the fd.
>  * On libxl teardown, the evtchn fd should therefore be unregistered.
>assert that this is the case.

Is there no locking required when registering/deregistering? Since there
are list manipulations in most of these places I presume it already
exists, but I thought I should check.

> 
> Signed-off-by: Ian Jackson 
> ---
>  tools/libxl/libxl.c  |4 +---
>  tools/libxl/libxl_event.c|   27 +++
>  tools/libxl/libxl_internal.h |1 +
>  3 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 785253d..e0db4eb 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -118,8 +118,6 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
>  rc = ERROR_FAIL; goto out;
>  }
>  
> -rc = libxl__ctx_evtchn_init(gc);
> -
>  *pctx = ctx;
>  return 0;
>  
> @@ -166,7 +164,7 @@ int libxl_ctx_free(libxl_ctx *ctx)
>  for (i = 0; i < ctx->watch_nslots; i++)
>  assert(!libxl__watch_slot_contents(gc, i));
>  assert(!libxl__ev_fd_isregistered(&ctx->watch_efd));
> -libxl__ev_fd_deregister(gc, &ctx->evtchn_efd);
> +assert(!libxl__ev_fd_isregistered(&ctx->evtchn_efd));
>  assert(!libxl__ev_fd_isregistered(&ctx->sigchld_selfpipe_efd));
>  
>  /* Now there should be no more events requested from the application: */
> diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
> index da0a20e..716f318 100644
> --- a/tools/libxl/libxl_event.c
> +++ b/tools/libxl/libxl_event.c
> @@ -721,7 +721,7 @@ static void evtchn_fd_callback(libxl__egc *egc, 
> libxl__ev_fd *ev,
>  
>  int libxl__ctx_evtchn_init(libxl__gc *gc) {
>  xc_evtchn *xce;
> -int rc, fd;
> +int rc;
>  
>  if (CTX->xce)
>  return 0;
> @@ -733,14 +733,10 @@ int libxl__ctx_evtchn_init(libxl__gc *gc) {
>  goto out;
>  }
>  
> -fd = xc_evtchn_fd(xce);
> -assert(fd >= 0);
> +CTX->evtchn_fd = xc_evtchn_fd(xce);
> +assert(CTX->evtchn_fd >= 0);

Given that you can always retrieve this no demand with xc_evtchn_fd(xce)
and that it is cheap do you need to stash it in the ctx?

> -rc = libxl_fd_set_nonblock(CTX, fd, 1);
> -if (rc) goto out;
> -
> -rc = libxl__ev_fd_register(gc, &CTX->evtchn_efd,
> -   evtchn_fd_callback, fd, POLLIN);
> +rc = libxl_fd_set_nonblock(CTX, CTX->evtchn_fd, 1);
>  if (rc) goto out;
>  
>  CTX->xce = xce;
> @@ -751,6 +747,12 @@ int libxl__ctx_evtchn_init(libxl__gc *gc) {
>  return rc;
>  }
>  
> +static void evtchn_check_fd_deregister(libxl__gc *gc)
> +{
> +if (CTX->xce && LIBXL_LIST_EMPTY(&CTX->evtchns_waiting))
> +libxl__ev_fd_deregister(gc, &CTX->evtchn_efd);
> +}
> +
>  int libxl__ev_evtchn_wait(libxl__gc *gc, libxl__ev_evtchn *evev)
>  {
>  int r, rc;
> @@ -758,6 +760,13 @@ int libxl__ev_evtchn_wait(libxl__gc *gc, 
> libxl__ev_evtchn *evev)
>  DBG("ev_evtchn=%p port=%d wait (was waiting=%d)",
>  evev, evev->port, evev->waiting);
>  
> +rc = libxl__ctx_evtchn_init(gc);
> +if (rc) goto out;
> +
> +rc = libxl__ev_fd_register(gc, &CTX->evtchn_efd,
> +   evtchn_fd_callback, CTX->evtchn_fd, POLLIN);
> +if (rc) goto out;

Do you not need to do this only if evtchns_waiting is currently empty or
the efd is idle?

> +
>  if (evev->waiting)
>  return 0;

If you hit this you leave the stuff above done. Which may or may not 
matter depending on the answer above.


Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 0/6] libxl: events: Tear down fd interests when idle

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:30 +, Ian Jackson wrote:
> Konrad: here is a set of fixes for libxl which ought IMO to be in 4.5.
> See below, or the rest of the thread, for details.  It's broken up
> into 6 tiny patches for ease of review.

I tested an identical version of this series, just without the commit
logs, with libvirt on ARM. So in addition to my various acks:
 
Tested-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> * On libxl teardown, the watch fd should therefore be unregistered.
>   assert that this is the case.

A bunch of the patches in this series basically assume that the ctx is
idle when it is freed, i.e. it requires everything to be explicitly
cancelled rather than implicitly doing so on free.

I think this is a fine restriction, but it probably ought to be written
down.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 12:36,  wrote:
> However, it occurs to me that HVMOP_op_mask absolutely must live in the
> public header files, as it is guest visible, and forms part of the ABI.
> 
> Consider a userspace task which falls into a continuation, is preempted
> and paused by the guest kernel, the entire VM migrated, and the task
> unpaused later.  If HVMOP_op_mask has changed in that time, the
> continuation will become erroneous.
> 
> In particular, all hypercall continuation strategies we use *must* be
> forward compatible when moving to newer versions of Xen, because Xen
> cannot possibly guarantee that a continuation which starts will finish
> on the same hypervisor.

Hmm, I see the issue, but surfacing such implementation details
would not only be unfortunate, but eventually prevent us from
making future changes. Just recall the mem-op case where we had
to widen the continuation encoding mask at some point. Hence while
I can't immediately see a way to avoid the situation you describe
(and it doesn't even take a user space task in a preemptible kernel),
I think we should allow ourselves a little more time to find possible
solutions other than locking down these masks (really they don't
need to be exposed in the public headers, but we would need
them to not change if no other solution can be found).

One thing to note is that the _introduction_ of such a mask (as
has happened for hvm-op between 4.4 and 4.5) is not a problem
as long as the respective bits all being zero has no special
meaning. What I considered for mem-op a while ago though (and
then forgot again) was to refuse non-zero start_extent values
for any operations not making use of that mechanism. The same
would of course be applicable to gnttab-op and hvm-op.

What might additionally make this not that urgent an issue for 4.5
is that only XSM_DM_PRIV guarded operations are affected, and
I suppose a stubdom gets re-created on the target host (rather
than migrated) when its client migrates.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 12:48 +, Julien Grall wrote:
> Hi Ian,
> 
> On 28/11/14 11:47, Ian Campbell wrote:
> > On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote:
> >> state at the GIC level. This would also avoid masking the output signal
> >> and requires specific handling in the guest OS.
> > 
> > "which requires"?
> > 
> > It doesn't seem quite right to me otherwise, since context switching the
> > virq state *removes* the need to have the guest do anything other than
> > what it would do on native.
> 
> I though the "avoid" would apply for both "masking" and "requires".

I think it reads with the avoid binding tightly to the masking only.

Possibly s/requires/requiring/ would have also corrected the meaning to
what you intended, although I would have changed the "and" to "or" as
well to make it less ambiguous.

> > Assuming this is what you meant I propose (fixing some grammar etc as I
> > go):
> 
> 
> Thanks for the correction, I will use this version. Shall I put your
> signed-off-by?

I don't think that's needed, its was pretty small.

(i.e. I wouldn't have added my S-o-b if I did it on commit).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 13:07 +, Jan Beulich wrote:
> I suppose a stubdom gets re-created on the target host (rather
> than migrated) when its client migrates.

Correct.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed

2014-11-28 Thread Ian Campbell
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> We want to have no fd events registered when we are idle.
> In this patch, deal with the xenstore watch fd:
> 
> * Track the total number of active watches.
> * When deregistering a watch, or when watch registration fails, check
>   whether there are now no watches and if so deregister the fd.
> * On libxl teardown, the watch fd should therefore be unregistered.
>   assert that this is the case.
> 
> Signed-off-by: Ian Jackson 

Acked-by: Ian Campbell 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Andrew Cooper
On 28/11/14 13:07, Jan Beulich wrote:
 On 28.11.14 at 12:36,  wrote:
>> However, it occurs to me that HVMOP_op_mask absolutely must live in the
>> public header files, as it is guest visible, and forms part of the ABI.
>>
>> Consider a userspace task which falls into a continuation, is preempted
>> and paused by the guest kernel, the entire VM migrated, and the task
>> unpaused later.  If HVMOP_op_mask has changed in that time, the
>> continuation will become erroneous.
>>
>> In particular, all hypercall continuation strategies we use *must* be
>> forward compatible when moving to newer versions of Xen, because Xen
>> cannot possibly guarantee that a continuation which starts will finish
>> on the same hypervisor.
> Hmm, I see the issue, but surfacing such implementation details
> would not only be unfortunate, but eventually prevent us from
> making future changes. Just recall the mem-op case where we had
> to widen the continuation encoding mask at some point. Hence while
> I can't immediately see a way to avoid the situation you describe
> (and it doesn't even take a user space task in a preemptible kernel),
> I think we should allow ourselves a little more time to find possible
> solutions other than locking down these masks (really they don't
> need to be exposed in the public headers, but we would need
> them to not change if no other solution can be found).

It is certainly unfortunate, but I don't see that we have any choice. 
The implementation details of continuations have already slipped into
the ABI.

>
> One thing to note is that the _introduction_ of such a mask (as
> has happened for hvm-op between 4.4 and 4.5) is not a problem
> as long as the respective bits all being zero has no special
> meaning. What I considered for mem-op a while ago though (and
> then forgot again) was to refuse non-zero start_extent values
> for any operations not making use of that mechanism. The same
> would of course be applicable to gnttab-op and hvm-op.
>
> What might additionally make this not that urgent an issue for 4.5
> is that only XSM_DM_PRIV guarded operations are affected, and
> I suppose a stubdom gets re-created on the target host (rather
> than migrated) when its client migrates.

The problem is with continuations which reuse the upper bits of the
input register, not with this HVMOP_op_mask specifically; the
HVMOP_op_mask simply adds to an existing problem.  This is something
which needs considering urgently because, as you identify above, we have
already suffered an accidental ABI breakage with the mem-op widening.

32bit HVM guests reliably form a continuation on every single iteration
of a continuable hypercall (e.g. decrease reservation, which is the base
of my XSA-111 PoC), making it trivial to construct a migrateable HVM
domain which exposes the issue.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2] Add support for Xen ARM guest on FreeBSD

2014-11-28 Thread Andrew Turner
On Sun, 23 Nov 2014 22:35:36 +
Julien Grall  wrote:
> Hello all,
> 
> At the beginning of the year, I have sent a first RFC to add support
> for FreeBSD on Xen ARM [1].
...
> Major changes in this new version:
>   * Add Device Tree support via Linux Boot ABI
>   * Add zImage support
>   * Netfront support
>   * Blkfront fixes
>   * DOM0 support (separate branch see below)
> 
> The former item is very hackish. I was wondering if there is another
> way to do it? Or maybe we should support FreeBSD Bootloader in ARM
> guest?

I think using the loader is the correct way to handle booting in Xen. It
allows us to relocate the dtb as required. It look like a zImage then
use the Xen console to interact with the user.

> 
> The patch series is divided in X parts:
>   * #1 - #14: Clean up and bug fixes for Xen. They can be
> applied without the rest of the series
>  * #15 - #19: Update Xen interface to 4.4 and fix
> compilation. It's required for ARM.
>  * #20 - #26: Update Xen code to support ARM
>  * #27 - #33: Rework the event channel code for supporting
> ARM. I will work with Royger to come with a common interface with x86
>  * #34 - #36: Add support for ARM in Xen code
>  * #37 - #46: ARM bug fixes and new features. Some of thoses 
> patches (#37 - #40) could be applied without the rest of the series
>  * #47 - #48: Add Xen ARM platform

I have committed patches 30 and 40 as they look good. I'm not familiar
with the code to review 37 or 38, however from my quick look at 38 I
appears _bus_dmamap_load_buffer does take in to account buflen and
dmat->maxsegsz when setting sgsize just not dmat->alignment.

...
> 
> TODO:
>   * Add SMP/PSCI support in FreeBSD. Could be useful other
> platform too

Adding PSCI support is on my TODO lost for arm64, however I don't
expect to get on ti in until early next year.

>   * Only FreeBSD to load anywhere. Currently there is a 2M
> alignment which require a patch in Xen.

If you use the loader this will be fixed as the loader can load the
kernel at the correct alignment.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd

2014-11-28 Thread Ian Jackson
Ian Campbell writes ("Re: [PATCH 3/6] libxl: events: Deregister, don't just 
modify, sigchld pipe fd"):
> On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > We want to have no fd events registered when we are idle.  This
> > implies that we must be able to deregister our interest in the sigchld
> > self-pipe fd, not just modify to request no events.
> > 
> > Signed-off-by: Ian Jackson 
> 
> Acked-by: Ian Campbell 
> 
> I think in principal there is now some redundant code in
> libxl__sigchld_needed which calls modify to set POLLIN if the fd is not
> registered. I think it's redundant because now the fd is registered iff
> it is looking for POLLIN.

You're right.  (Although you mean `calls modify if the fd /is/
registered'.)

I wonder if it would be better to leave tidying that up until
post-4.5, in case we are wrong.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 14:42 +, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 3/6] libxl: events: Deregister, don't just 
> modify, sigchld pipe fd"):
> > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > > We want to have no fd events registered when we are idle.  This
> > > implies that we must be able to deregister our interest in the sigchld
> > > self-pipe fd, not just modify to request no events.
> > > 
> > > Signed-off-by: Ian Jackson 
> > 
> > Acked-by: Ian Campbell 
> > 
> > I think in principal there is now some redundant code in
> > libxl__sigchld_needed which calls modify to set POLLIN if the fd is not
> > registered. I think it's redundant because now the fd is registered iff
> > it is looking for POLLIN.
> 
> You're right.  (Although you mean `calls modify if the fd /is/
> registered'.)

Indeed.

> I wonder if it would be better to leave tidying that up until
> post-4.5, in case we are wrong.

Sure.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed

2014-11-28 Thread Ian Jackson
Ian Campbell writes ("Re: [PATCH 5/6] libxl: events: Deregister evtchn fd when 
not needed"):
> On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > We want to have no fd events registered when we are idle.
> > In this patch, deal with the evtchn fd:
> > 
> >  * Defer setup of the evtchn handle to the first use.
> >  * Defer registration of the evtchn fd; register as needed on use.
> >  * When cancelling an evtchn wait, or when wait setup fails, check
> >whether there are now no evtchn waits and if so deregister the fd.
> >  * On libxl teardown, the evtchn fd should therefore be unregistered.
> >assert that this is the case.
> 
> Is there no locking required when registering/deregistering? Since there
> are list manipulations in most of these places I presume it already
> exists, but I thought I should check.

libxl__ev_evtchn_* is always called with the ctx lock held.

However, that they don't take the lock is contrary to the rules stated
for libxl__ev_* in the doc comment.  That should be fixed for 4.6.

libxl__ev_fd_* already take and release the lock to protect their own
data structures etc.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 14:47 +, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 5/6] libxl: events: Deregister evtchn fd 
> when not needed"):
> > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > > We want to have no fd events registered when we are idle.
> > > In this patch, deal with the evtchn fd:
> > > 
> > >  * Defer setup of the evtchn handle to the first use.
> > >  * Defer registration of the evtchn fd; register as needed on use.
> > >  * When cancelling an evtchn wait, or when wait setup fails, check
> > >whether there are now no evtchn waits and if so deregister the fd.
> > >  * On libxl teardown, the evtchn fd should therefore be unregistered.
> > >assert that this is the case.
> > 
> > Is there no locking required when registering/deregistering? Since there
> > are list manipulations in most of these places I presume it already
> > exists, but I thought I should check.
> 
> libxl__ev_evtchn_* is always called with the ctx lock held.

For the most part this is implicit due to the caller being in an ao
callback, right?

> However, that they don't take the lock is contrary to the rules stated
> for libxl__ev_* in the doc comment.  That should be fixed for 4.6.

OK.

> libxl__ev_fd_* already take and release the lock to protect their own
> data structures etc.
> 
> Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed

2014-11-28 Thread Ian Jackson
Ian Campbell writes ("Re: [PATCH 2/6] libxl: events: Deregister xenstore watch 
fd when not needed"):
> On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > * On libxl teardown, the watch fd should therefore be unregistered.
> >   assert that this is the case.
> 
> A bunch of the patches in this series basically assume that the ctx is
> idle when it is freed, i.e. it requires everything to be explicitly
> cancelled rather than implicitly doing so on free.

libxl_ctx_free explicitly disables all the application-requested event
generators.  (free_disable_deaths and libxl__evdisable_disk_eject.)

Destroying the ctx during the execution of an asynchronous operation
is forbidden by this text in libxl.h (near line 813):
  * *ao_how does not need to remain valid after the initiating function
  * returns. All other parameters must remain valid for the lifetime of
  * the asynchronous operation, unless otherwise specified.
That implies that the ctx must remain valid during the ao, so it may
not be destroyed beforehand.

Those are the two ways that, even without any threads inside libxl, a
ctx can be other than idle.

It should be obvious to the application programmer that destroying the
ctx when there are other threads inside libxl is not going to work.
Indeed a programmer who tries to destroy the ctx when they have
threads which might be inside libxl cannot ensure that the ctx is
valid even on entry to libxl.

> I think this is a fine restriction, but it probably ought to be written
> down.

Does the above demonstrate that the existing restrictions are
documented ?  I'd rather avoid writing the restrictions twice if it
can be avoided - these docs are long enough as they are.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 14:56 +, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 2/6] libxl: events: Deregister xenstore 
> watch fd when not needed"):
> > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote:
> > > * On libxl teardown, the watch fd should therefore be unregistered.
> > >   assert that this is the case.
> > 
> > A bunch of the patches in this series basically assume that the ctx is
> > idle when it is freed, i.e. it requires everything to be explicitly
> > cancelled rather than implicitly doing so on free.
> 
> libxl_ctx_free explicitly disables all the application-requested event
> generators.  (free_disable_deaths and libxl__evdisable_disk_eject.)

So it does, I was looking for that before commenting but didn't see
those calls for what they were, despite the comment right there...

> Destroying the ctx during the execution of an asynchronous operation
> is forbidden by this text in libxl.h (near line 813):
>   * *ao_how does not need to remain valid after the initiating function
>   * returns. All other parameters must remain valid for the lifetime of
>   * the asynchronous operation, unless otherwise specified.
> That implies that the ctx must remain valid during the ao, so it may
> not be destroyed beforehand.
> 
> Those are the two ways that, even without any threads inside libxl, a
> ctx can be other than idle.
> 
> It should be obvious to the application programmer that destroying the
> ctx when there are other threads inside libxl is not going to work.
> Indeed a programmer who tries to destroy the ctx when they have
> threads which might be inside libxl cannot ensure that the ctx is
> valid even on entry to libxl.
> 
> > I think this is a fine restriction, but it probably ought to be written
> > down.
> 
> Does the above demonstrate that the existing restrictions are
> documented ?

Yes.

>   I'd rather avoid writing the restrictions twice if it
> can be avoided - these docs are long enough as they are.

Indeed.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Stefano Stabellini
create ^
title it QMP connection problems prevent libxl from calling 
libxl__device_pci_reset on domain shutdown
thanks

On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> Hi,
> 
> While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> libxl__device_pci_reset()"
> doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci 
> passthrough.
> xl didn't show any problems on the commandline so i never drawed much 
> attention 
> to it, but /var/log/xen/xl-guest.log shows:
> 
> Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> Domain 19 has shut down, reason code 0 0x0
> Action for shutdown reason code 0 is destroy
> Domain 19 needs to be cleaned up: destroying the domain
> libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection 
> reset by peer
> libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to connect to 
> QMP
> libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: 
> Connection reset by peer
> libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already 
> exited
> Done. Exiting now
> 
> So it doesn't even get to calling  "libxl_pci.c: libxl__device_pci_reset()".
> 
> --
> Sander
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer

2014-11-28 Thread Julien Grall
Some platforms (such as Xgene and ARMv8 models) use an edge-triggered interrupt
for the virtual timer. Even if the timer output signal is masked in the
context switch, the GIC will keep track that of any interrupts raised
while IRQs are disabled. As soon as IRQs are re-enabled, the virtual
interrupt timer will be injected to Xen.

If an idle vVCPU was scheduled next then the interrupt handler doesn't
expect to the receive the IRQ and will crash:

(XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC)
(XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR)
(XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0
(XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54
(XEN)[<00247010>] do_IRQ+0x1a4/0x220
(XEN)[<00244864>] gic_interrupt+0x50/0xec
(XEN)[<0024fbac>] do_trap_irq+0x20/0x2c
(XEN)[<00255240>] hyp_irq+0x5c/0x60
(XEN)[<00241084>] context_switch+0xb8/0xc4
(XEN)[<0022482c>] schedule+0x684/0x6d0
(XEN)[<0022785c>] __do_softirq+0xcc/0xe8
(XEN)[<002278d4>] do_softirq+0x14/0x1c
(XEN)[<00240fac>] idle_loop+0x134/0x154
(XEN)[<0024c160>] start_secondary+0x14c/0x15c
(XEN)[<0001>] 0001

The proper solution is to context switch the virtual interrupt state at
the GIC level. This would also avoid masking the output signal which
requires specific handling in the guest OS and more complex code in Xen
to deal with EOIs, and so is desirable for that reason too.

Sadly, this solution requires some refactoring which would not be
suitable for a freeze exception for the Xen 4.5 release.

For now implement a temporary solution which ignores the virtual timer
interrupt when the idle VCPU is running.

Signed-off-by: Julien Grall 

---

Changes in v2:
- Reword the commit message and comment in the code to explain the
real bug. Based on Ian's reword.
- Use unlikely

This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4.
It affects at least Xgene platform and ARMv8 models where Xen may
randomly crash.

This patch don't inject the virtual timer interrupt if the current VCPU
is the idle one. For now, I think this patch is the safest way to resolve
the problem.

I will work on a proper solution for Xen 4.6.
---
 xen/arch/arm/time.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index a6436f1..471d7a9 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -169,6 +169,19 @@ static void timer_interrupt(int irq, void *dev_id, struct 
cpu_user_regs *regs)
 
 static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
+/*
+ * Edge-triggered interrupt can be used for the virtual timer. Even
+ * if the timer output signal is masked in the context switch, the
+ * GIC will keep track that of any interrupts raised while IRQS as
+ * disabled. As soon as IRQs are re-enabled, the virtual interrupt
+ * will be injected to Xen.
+ *
+ * If an IDLE vCPU was scheduled next then we should ignore the
+ * interrupt.
+ */
+if ( unlikely(is_idle_vcpu(current)) )
+return;
+
 current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0);
 WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0);
 vgic_vcpu_inject_irq(current, current->arch.virt_timer.irq);
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 14:55,  wrote:
> The problem is with continuations which reuse the upper bits of the
> input register, not with this HVMOP_op_mask specifically; the
> HVMOP_op_mask simply adds to an existing problem.  This is something
> which needs considering urgently because, as you identify above, we have
> already suffered an accidental ABI breakage with the mem-op widening.

Since we can't retroactively fix the mem-op widening, I still don't see
what's urgent here: As long as we don't change any of these masks,
nothing bad is going to happen. Of course one thing to consider with
this aspect in mind is whether to change the hvm-op or gnttab-op
masks again _before_ 4.5 goes out, based on whether we feel they're
wide enough for the (un)foreseeable future.

> 32bit HVM guests reliably form a continuation on every single iteration
> of a continuable hypercall (e.g. decrease reservation, which is the base
> of my XSA-111 PoC), making it trivial to construct a migrateable HVM
> domain which exposes the issue.

Hmm, I can't offhand see why that would be, but what you write
reads like you determined the reason? I ask because an unavoidable
use of continuations is certainly nice for making sure those code paths
get tested, but would be quite desirable to get eliminated at least for
non-debug builds.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0

2014-11-28 Thread Anthony Wright
We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.

Shortly after the upgrade we started to lose network connectivity to the
DomU a few times a day that required a reboot to fix. We see nothing in
the xen logs or xl dmesg, but when we looked at the dmesg output we saw
the following output for the two incidents we investigated in detail:

[69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
[69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
[69332.031069] br-default: port 2(vif4.0) entered disabled state


[824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
[824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
[824365.531191] br-default: port 2(vif9.0) entered disabled state

We have a very similar setup running on another machine with a 3.17.3
DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
machine. This is a test system rather than a production system so has a
different workload and fewer CPUs.

The piece of code that outputs the error is in
drivers/net/xen-netback/netback.c.

The DomU has 4000MB of RAM and 8 CPUs.

Any ideas?

Thanks,

Anthony.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0

2014-11-28 Thread Ian Campbell
On Fri, 2014-11-28 at 15:19 +, Anthony Wright wrote:
> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
> 3.17.3

Is this a Debian kernel? In which case you might be seeing
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be
fixed in the next upload of the kernel, test binaries with the fixes are
referenced in the bug log.

Even if not Debian then you'll probably want the same set of backports.

Ian.

>  running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
> 
> Shortly after the upgrade we started to lose network connectivity to the
> DomU a few times a day that required a reboot to fix. We see nothing in
> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
> the following output for the two incidents we investigated in detail:
> 
> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
> [69332.031069] br-default: port 2(vif4.0) entered disabled state
> 
> 
> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
> [824365.531191] br-default: port 2(vif9.0) entered disabled state
> 
> We have a very similar setup running on another machine with a 3.17.3
> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
> machine. This is a test system rather than a production system so has a
> different workload and fewer CPUs.
> 
> The piece of code that outputs the error is in
> drivers/net/xen-netback/netback.c.
> 
> The DomU has 4000MB of RAM and 8 CPUs.
> 
> Any ideas?
> 
> Thanks,
> 
> Anthony.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer

2014-11-28 Thread Julien Grall
I forgot to add "for 4.5" in the commit title.

On 28/11/14 15:17, Julien Grall wrote:
 > This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4.
> It affects at least Xgene platform and ARMv8 models where Xen may
> randomly crash.

Thinking a bit more to this. I believe it's possible for a guest to
crash the host if the next timer deadline is short enough and it execute
a yield. Might be unlikely ...

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V8 2/3] libxl domain snapshot API design

2014-11-28 Thread Ian Campbell
On Tue, 2014-11-25 at 02:08 -0700, Chun Yan Liu wrote:
> Hi, Ian,
> 
> According to previous discussion, snapshot delete and revert are
> inclined to be done by high level application itself, won't supply a
> libxl API.

I thought you had explained a scenario where the toolstack needed to be
at least aware of delete, specifically when you are deleting a snapshot
from the middle of an active chain. 

Maybe that's not "snapshot delete API in libxl" though, but rather a
notification API which the toolstack can use to tell libxl something is
going on.

>  I'm wondering snapshot create need a new common API?
> In fact its main work is save domain and take disk snapshot, xl can
> do it too. 

I don't believe xl can take a disk snapshot of an active disk, it
doesn't have the machinery to deal with that sort of thing, nor should
it, this is exactly the sort of thing which libxl is provided to deal
with.

Also, libxl is driving the migration/memory snapshot, and I think the
disk snapshot fundamentally needs to be involved in that process, not
done separately by the toolstack.

> I just write down an overview of the snapshot work (see below).
> The problem is: do we need to export API? What kind of API?
> In updating Bamvor's code, I think xl can do all the work, libvirt can
> do the work too even without libxl's help.
> 
> Of course, there are some thing if put in libxl, it will be easier to
> use, like the domain snapshot info structure, gentype.py will
> directly generate useful init/dispose/to_json/from_json functions.
> Or the disk snapshot part can be extracted and placed in libxl or libxlu.
> 
> Any suggestions about which part is better to be extracted as libxl
> API or better not?
> 
> Thanks,
> Chunyan
> 
> --
> libxl domain snapshot overview

Just to be 100% clear: This is an overview of a domain snapshot
architecture for a toolstack which uses libxl. A bunch of the things
described here belong to the toolstack and not to libxl itself.

I've tried to read with that in mind but a complete document should
mention this and be careful to be clear about the distinction where it
matters.

> 0. Glossary
[...]
> * not support disk-only snapshot [1].
>
> [1]
>  This is different from "libvirt".
>  To xl, it only concerns active domains, and even when domain
>  is paused, there is no data flush to disk operation. So, take
>  a disk-only snapshot and then resume, it is as if the guest
>  had crashed. For this reason, disk-only snapshot is meaningless
>  to xl. Should not support.
> 
>  To libvirt, it has active domains and inactive domains, for
>  the active domains, as "xl", it's meaning less to take disk-only
>  snapshot, but for inactive domains, disk-only snapshot is valid.
>  Should support.

Do you mean to say here that disk-only snapshots are not supported in
some toolstacks, or in no toolstack? Or are you just saying that libxl
doesn't need to support them because they only apply to inactive
domains?

In either case it seems to me like your footnote is saying that you *do*
want to support disk-only snapshots, at least in some stacks and/or
configurations.

I think you probably mean to say that disk-only snapshots of *active*
domains are not supported. Whereas disk-only snapshots of inactive
domains may or may not be depending on the toolstack.

> 
> 2. Requirements
> 
> General Requirements:
> * ability to save/restore domain memory
> * ability to create/delete/apply disk snapshot [2]
> * ability to parse user config file
> * ability to save/load/update domain snapshot metadata (or called
>   domain snapshot info, the metadata at least includes:
>   snapshot name, create time, description, memory state file,
>   disk snapshot info, parent (in snapshot chain), current (is
>   currently applied))
> 
> [2] Disk snapshot requirements:
> * external tools: qemu-img, lvcreate, vhd-util, etc.
> * For a basic goal, we support 'raw' and 'qcow2' backend types only.
>   Then only requires qemu:
> use libxl qmp command (better) or "qemu-img"

You should leave these implementation details for a later section, in
this context they just invite quibbling about whether things belong in
libxl etc and whether qmp commands are "better".

The rest looks ok, but without the remainder of the design described in
terms of the concepts given here it's hard to comment further.

I'd suggest putting this all into one coherent document (not 3 as
before) which starts by describing the terminology (section 0 in your
mail which I'm replying to now), then gives an overview of the
architecture (the rest of that mail), then describe which components
(libxl, toolstack, etc) implement each bit of the architecture, then
describe the libxl API which makes this possible (covered in previous
mails I think).

I think you have most of the words either here or from the other mails,
they just need putting togethe

Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Andrew Cooper
On 28/11/14 15:18, Jan Beulich wrote:
 On 28.11.14 at 14:55,  wrote:
>> The problem is with continuations which reuse the upper bits of the
>> input register, not with this HVMOP_op_mask specifically; the
>> HVMOP_op_mask simply adds to an existing problem.  This is something
>> which needs considering urgently because, as you identify above, we have
>> already suffered an accidental ABI breakage with the mem-op widening.
> Since we can't retroactively fix the mem-op widening, I still don't see
> what's urgent here: As long as we don't change any of these masks,
> nothing bad is going to happen. Of course one thing to consider with
> this aspect in mind is whether to change the hvm-op or gnttab-op
> masks again _before_ 4.5 goes out, based on whether we feel they're
> wide enough for the (un)foreseeable future.

By urgent, I mean exactly this, while we have the ability to tweak the
masks.

>
>> 32bit HVM guests reliably form a continuation on every single iteration
>> of a continuable hypercall (e.g. decrease reservation, which is the base
>> of my XSA-111 PoC), making it trivial to construct a migrateable HVM
>> domain which exposes the issue.
> Hmm, I can't offhand see why that would be, but what you write
> reads like you determined the reason?

I have never identified why, but it is reliable.  c/s 79de2d31f1 proves
that some preemption condition is true for 32bit HVM guests by the time
the hypercall handler is called.  It is unreliable with 64bit HVM
guests, suggesting that the compat translation layer may be the root
cause of the issue.

> I ask because an unavoidable
> use of continuations is certainly nice for making sure those code paths
> get tested, but would be quite desirable to get eliminated at least for
> non-debug builds.

While the preemption code is necessary to avoid spending ludicrous
quantities of time synchronously in Xen, it is currently having the
worst possible effect it could have on the system, by causing 32bit HVM
guests to undergo a vmentry/vmexit and double compat translation for
each iteration.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] get a handle for the tap device to shut it down

2014-11-28 Thread Olaf Hering

Xen does a shutdown of the emulated PCI network device in
pci_unplug_nics. But this just disables the PCI device. The tap device
for a given emulated card remains active because nothing closes the
file descriptor. 

The cmdline for qemu contains something like "-device
rtl8139,id=nic0,netdev=net0,mac=00:16:3e:28:f1:ee -netdev
type=tap,id=net0,ifname=vif1.0-emu,script=no,downscript=no".

I wonder if the missing disable of the tap device is intentional, or
just an oversight, or if its just to complicated to get from a
"PCIDevice *" to the other end and call the ->cleanup function.


Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0

2014-11-28 Thread Anthony Wright
On 28/11/2014 15:23, Ian Campbell wrote:
> On Fri, 2014-11-28 at 15:19 +, Anthony Wright wrote:
>> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
>> 3.17.3
> Is this a Debian kernel? In which case you might be seeing
It's a stock kernel from kernel.org, we have a custom system with no
relation to Debian.
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be
> fixed in the next upload of the kernel, test binaries with the fixes are
> referenced in the bug log.
The error messages we're seeing are different from those reported, both
the Dom0 and DomU continue to run correctly and the vif doesn't degrade
slowly it fails the test in netback.c below which disables the interface:

/* No crossing a page as the payload mustn't fragment. */
if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
netdev_err(queue->vif->dev,
"txreq.offset: %x, size: %u, end: %lu\n",
txreq.offset, txreq.size,
(txreq.offset&~PAGE_MASK) + txreq.size);
xenvif_fatal_tx_err(queue->vif);
break;
}
> Even if not Debian then you'll probably want the same set of backports.
I'm happy to apply the backports if you think it's likely to fix the
problem despite the different symptoms, but from what I can see it looks
like a different problem.

thanks,

Anthony
> Ian.
>>  running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>>
>> Shortly after the upgrade we started to lose network connectivity to the
>> DomU a few times a day that required a reboot to fix. We see nothing in
>> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
>> the following output for the two incidents we investigated in detail:
>>
>> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
>> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
>> [69332.031069] br-default: port 2(vif4.0) entered disabled state
>>
>>
>> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
>> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
>> [824365.531191] br-default: port 2(vif9.0) entered disabled state
>>
>> We have a very similar setup running on another machine with a 3.17.3
>> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
>> machine. This is a test system rather than a production system so has a
>> different workload and fewer CPUs.
>>
>> The piece of code that outputs the error is in
>> drivers/net/xen-netback/netback.c.
>>
>> The DomU has 4000MB of RAM and 8 CPUs.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Anthony.
>>
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Wei Liu
On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
> create ^
> title it QMP connection problems prevent libxl from calling 
> libxl__device_pci_reset on domain shutdown
> thanks
> 
> On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> > Hi,
> > 
> > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> > libxl__device_pci_reset()"
> > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci 
> > passthrough.
> > xl didn't show any problems on the commandline so i never drawed much 
> > attention 
> > to it, but /var/log/xen/xl-guest.log shows:
> > 
> > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> > Domain 19 has shut down, reason code 0 0x0
> > Action for shutdown reason code 0 is destroy
> > Domain 19 needs to be cleaned up: destroying the domain
> > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection 
> > reset by peer
> > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to connect 
> > to QMP
> > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: 
> > Connection reset by peer
> > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already 
> > exited
> > Done. Exiting now
> > 
> > So it doesn't even get to calling  "libxl_pci.c: libxl__device_pci_reset()".
> > 
> > --

It's worth checking whether the device model exits too early, i.e., did
it crash? What's in the DM log?

Wei.

> > Sander
> > 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.5] reset PCI devices on force removal even when QEMU returns error

2014-11-28 Thread Stefano Stabellini
On do_pci_remove when QEMU returns error, we just bail out early without
resetting the device. On domain shutdown we are racing with QEMU exiting
and most often QEMU closes the QMP connection before executing the
requested command.

In these cases if force=1, it makes sense to go ahead with rest of the
PCI device removal, that includes resetting the device and calling
xc_deassign_device. Otherwise we risk not resetting the device properly
on domain shutdown.

Signed-off-by: Stefano Stabellini 

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 316643c..0ac0b93 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1243,7 +1245,7 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
 rc = ERROR_INVAL;
 goto out_fail;
 }
-if (rc) {
+if (rc && !force) {
 rc = ERROR_FAIL;
 goto out_fail;
 }

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Stefano Stabellini
On Fri, 28 Nov 2014, Wei Liu wrote:
> On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
> > create ^
> > title it QMP connection problems prevent libxl from calling 
> > libxl__device_pci_reset on domain shutdown
> > thanks
> > 
> > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> > > Hi,
> > > 
> > > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> > > libxl__device_pci_reset()"
> > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci 
> > > passthrough.
> > > xl didn't show any problems on the commandline so i never drawed much 
> > > attention 
> > > to it, but /var/log/xen/xl-guest.log shows:
> > > 
> > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> > > Domain 19 has shut down, reason code 0 0x0
> > > Action for shutdown reason code 0 is destroy
> > > Domain 19 needs to be cleaned up: destroying the domain
> > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection 
> > > reset by peer
> > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to 
> > > connect to QMP
> > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: 
> > > Connection reset by peer
> > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already 
> > > exited
> > > Done. Exiting now
> > > 
> > > So it doesn't even get to calling  "libxl_pci.c: 
> > > libxl__device_pci_reset()".
> > > 
> > > --
> 
> It's worth checking whether the device model exits too early, i.e., did
> it crash? What's in the DM log?

Firstly I was thinking that if force=1 it makes sense to continue even
when QEMU returns error, see:

http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-linus bisection] complete test-amd64-i386-xl-credit2

2014-11-28 Thread xen . org
branch xen-unstable
xen branch xen-unstable
job test-amd64-i386-xl-credit2
test guest-saverestore

Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
  Bug introduced:  c6c9161d064d30e78904f3affe5184487493e0fc
  Bug not present: 8b2ed21e846c63d8f1bdee0d8df0645721a604a1


  commit c6c9161d064d30e78904f3affe5184487493e0fc
  Merge: 8b2ed21 b5e212a
  Author: Linus Torvalds 
  Date:   Fri Nov 21 15:46:17 2014 -0800
  
  Merge branch 'x86-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
  
  Pull x86 fixes from Thomas Gleixner:
   "Misc fixes:
 - gold linker build fix
 - noxsave command line parsing fix
 - bugfix for NX setup
 - microcode resume path bug fix
 - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
   lockup thread"
  
  * 'x86-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
x86, kaslr: Handle Gold linker for finding bss/brk
x86, mm: Set NX across entire PMD at boot
x86, microcode: Update BSPs microcode on resume
x86: Require exact match for 'noxsave' command line option
  
  commit b5e212a3051b65e426a513901d9c7001681c7215
  Author: Andy Lutomirski 
  Date:   Wed Nov 19 13:56:19 2014 -0800
  
  x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  
  TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
  _TIF_SINGLESTEP), not (1<<19).
  
  This code is involved in Dave's trinity lockup, but I don't see why
  it would cause any of the problems he's seeing, except inadvertently
  by causing a different path through entry_64.S's syscall handling.
  
  Signed-off-by: Andy Lutomirski 
  Cc: Don Zickus 
  Cc: Peter Zijlstra 
  Cc: Dave Jones 
  Cc: Linus Torvalds 
  Link: 
http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.l...@amacapital.net
  Signed-off-by: Thomas Gleixner 
  
  commit 70b61e362187b5fccac206506d402f3424e3e749
  Author: Kees Cook 
  Date:   Mon Nov 17 16:16:04 2014 -0800
  
  x86, kaslr: Handle Gold linker for finding bss/brk
  
  When building with the Gold linker, the .bss and .brk areas of vmlinux
  are shown as consecutive instead of having the same file offset. Allow
  for either state, as long as things add up correctly.
  
  Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
  Reported-by: Markus Trippelsdorf 
  Signed-off-by: Kees Cook 
  Cc: Junjie Mao 
  Link: http://lkml.kernel.org/r/20141118001604.ga25...@www.outflux.net
  Cc: sta...@vger.kernel.org
  Signed-off-by: Thomas Gleixner 
  
  commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58
  Author: Kees Cook 
  Date:   Fri Nov 14 11:47:37 2014 -0800
  
  x86, mm: Set NX across entire PMD at boot
  
  When setting up permissions on kernel memory at boot, the end of the
  PMD that was split from bss remained executable. It should be NX like
  the rest. This performs a PMD alignment instead of a PAGE alignment to
  get the correct span of memory.
  
  Before:
  ---[ High Kernel Mapping ]---
  ...
  0x8202d000-0x8220  1868K RW   GLB NX pte
  0x8220-0x82c010M RW   PSE GLB NX pmd
  0x82c0-0x82df5000  2004K RW   GLB NX pte
  0x82df5000-0x82e044K RW   GLB x  pte
  0x82e0-0xc000   978M pmd
  
  After:
  ---[ High Kernel Mapping ]---
  ...
  0x8202d000-0x8220  1868K RW   GLB NX pte
  0x8220-0x82e012M RW   PSE GLB NX pmd
  0x82e0-0xc000   978M pmd
  
  [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment.
  We really should unmap the reminder along with the holes
  caused by init,initdata etc. but thats a different issue ]
  
  Signed-off-by: Kees Cook 
  Cc: Andy Lutomirski 
  Cc: Toshi Kani 
  Cc: Yasuaki Ishimatsu 
  Cc: David Vrabel 
  Cc: Wang Nan 
  Cc: Yinghai Lu 
  Cc: sta...@vger.kernel.org
  Link: http://lkml.kernel.org/r/20141114194737.ga3...@www.outflux.net
  Signed-off-by: Thomas Gleixner 
  
  commit fb86b97300d930b57471068720c52bfa8622eab7
  Autho

Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Wei Liu
On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote:
> On Fri, 28 Nov 2014, Wei Liu wrote:
> > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
> > > create ^
> > > title it QMP connection problems prevent libxl from calling 
> > > libxl__device_pci_reset on domain shutdown
> > > thanks
> > > 
> > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> > > > Hi,
> > > > 
> > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> > > > libxl__device_pci_reset()"
> > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci 
> > > > passthrough.
> > > > xl didn't show any problems on the commandline so i never drawed much 
> > > > attention 
> > > > to it, but /var/log/xen/xl-guest.log shows:
> > > > 
> > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> > > > Domain 19 has shut down, reason code 0 0x0
> > > > Action for shutdown reason code 0 is destroy
> > > > Domain 19 needs to be cleaned up: destroying the domain
> > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: 
> > > > Connection reset by peer
> > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to 
> > > > connect to QMP
> > > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: 
> > > > Connection reset by peer
> > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model 
> > > > already exited
> > > > Done. Exiting now
> > > > 
> > > > So it doesn't even get to calling  "libxl_pci.c: 
> > > > libxl__device_pci_reset()".
> > > > 
> > > > --
> > 
> > It's worth checking whether the device model exits too early, i.e., did
> > it crash? What's in the DM log?
> 
> Firstly I was thinking that if force=1 it makes sense to continue even
> when QEMU returns error, see:
> 
> http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com

In normal case DM is not destroyed until PCI device is removed. So I
think the real issue is DM crashed.

libxl.c:
1571 if (libxl__device_pci_destroy_all(gc, domid) < 0)
1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for domid 
%d", domid);
1573 rc = xc_domain_pause(ctx->xch, domid);
1574 if (rc < 0) {
1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause 
failed for %d", domid);
1576 }
1577 if (dm_present) {
1578 if (libxl__destroy_device_model(gc, domid) < 0)
1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_model 
failed for %d", domid);
1580
1581 libxl__qmp_cleanup(gc, domid);
1582 }

The patch you posted covers up the real issue.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Stefano Stabellini
On Fri, 28 Nov 2014, Wei Liu wrote:
> On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote:
> > On Fri, 28 Nov 2014, Wei Liu wrote:
> > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
> > > > create ^
> > > > title it QMP connection problems prevent libxl from calling 
> > > > libxl__device_pci_reset on domain shutdown
> > > > thanks
> > > > 
> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> > > > > Hi,
> > > > > 
> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> > > > > libxl__device_pci_reset()"
> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with 
> > > > > pci passthrough.
> > > > > xl didn't show any problems on the commandline so i never drawed much 
> > > > > attention 
> > > > > to it, but /var/log/xen/xl-guest.log shows:
> > > > > 
> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> > > > > Domain 19 has shut down, reason code 0 0x0
> > > > > Action for shutdown reason code 0 is destroy
> > > > > Domain 19 needs to be cleaned up: destroying the domain
> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: 
> > > > > Connection reset by peer
> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to 
> > > > > connect to QMP
> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: 
> > > > > Connection reset by peer
> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model 
> > > > > already exited
> > > > > Done. Exiting now
> > > > > 
> > > > > So it doesn't even get to calling  "libxl_pci.c: 
> > > > > libxl__device_pci_reset()".
> > > > > 
> > > > > --
> > > 
> > > It's worth checking whether the device model exits too early, i.e., did
> > > it crash? What's in the DM log?
> > 
> > Firstly I was thinking that if force=1 it makes sense to continue even
> > when QEMU returns error, see:
> > 
> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com
> 
> In normal case DM is not destroyed until PCI device is removed. So I
> think the real issue is DM crashed.

I don't think it crashed: QEMU detects that domain shutdown has been
called and exits.
See:

static void main_loop(void)
{
bool nonblocking;
int last_io = 0;
#ifdef CONFIG_PROFILER
int64_t ti;
#endif
do {

[...]

} while (!main_loop_should_exit());
}


main_loop_should_exit returns true when qemu_shutdown_requested().


> libxl.c:
> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0)
> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for domid 
> %d", domid);
> 1573 rc = xc_domain_pause(ctx->xch, domid);
> 1574 if (rc < 0) {
> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause 
> failed for %d", domid);
> 1576 }
> 1577 if (dm_present) {
> 1578 if (libxl__destroy_device_model(gc, domid) < 0)
> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, 
> "libxl__destroy_device_model failed for %d", domid);
> 1580
> 1581 libxl__qmp_cleanup(gc, domid);
> 1582 }
> 
> The patch you posted covers up the real issue.

That is true, but I think the patch is correct regardless of the issue.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Sander Eikelenboom

Friday, November 28, 2014, 6:09:52 PM, you wrote:

> On Fri, 28 Nov 2014, Wei Liu wrote:
>> On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote:
>> > On Fri, 28 Nov 2014, Wei Liu wrote:
>> > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
>> > > > create ^
>> > > > title it QMP connection problems prevent libxl from calling 
>> > > > libxl__device_pci_reset on domain shutdown
>> > > > thanks
>> > > > 
>> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
>> > > > > Hi,
>> > > > > 
>> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
>> > > > > libxl__device_pci_reset()"
>> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with 
>> > > > > pci passthrough.
>> > > > > xl didn't show any problems on the commandline so i never drawed 
>> > > > > much attention 
>> > > > > to it, but /var/log/xen/xl-guest.log shows:
>> > > > > 
>> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
>> > > > > Domain 19 has shut down, reason code 0 0x0
>> > > > > Action for shutdown reason code 0 is destroy
>> > > > > Domain 19 needs to be cleaned up: destroying the domain
>> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: 
>> > > > > Connection reset by peer
>> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to 
>> > > > > connect to QMP
>> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: 
>> > > > > libxl__qmp_pci_del: Connection reset by peer
>> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model 
>> > > > > already exited
>> > > > > Done. Exiting now
>> > > > > 
>> > > > > So it doesn't even get to calling  "libxl_pci.c: 
>> > > > > libxl__device_pci_reset()".
>> > > > > 
>> > > > > --
>> > > 
>> > > It's worth checking whether the device model exits too early, i.e., did
>> > > it crash? What's in the DM log?
>> > 
>> > Firstly I was thinking that if force=1 it makes sense to continue even
>> > when QEMU returns error, see:
>> > 
>> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com
>> 
>> In normal case DM is not destroyed until PCI device is removed. So I
>> think the real issue is DM crashed.

> I don't think it crashed: QEMU detects that domain shutdown has been
> called and exits.
> See:

> static void main_loop(void)
> {
> bool nonblocking;
> int last_io = 0;
> #ifdef CONFIG_PROFILER
> int64_t ti;
> #endif
> do {

> [...]

> } while (!main_loop_should_exit());
> }


> main_loop_should_exit returns true when qemu_shutdown_requested().


>> libxl.c:
>> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0)
>> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for 
>> domid %d", domid);
>> 1573 rc = xc_domain_pause(ctx->xch, domid);
>> 1574 if (rc < 0) {
>> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause 
>> failed for %d", domid);
>> 1576 }
>> 1577 if (dm_present) {
>> 1578 if (libxl__destroy_device_model(gc, domid) < 0)
>> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, 
>> "libxl__destroy_device_model failed for %d", domid);
>> 1580
>> 1581 libxl__qmp_cleanup(gc, domid);
>> 1582 }
>> 
>> The patch you posted covers up the real issue.

> That is true, but I think the patch is correct regardless of the issue.

Hmm how about qemu's "-no-shutdown" option ?
>From the manpage:
-no-shutdown
Don't exit QEMU on guest shutdown, but instead only stop the emulation. 
This allows for instance switching to monitor to commit changes to the disk 
image. 

with:
device_model_args_hvm = [ '-no-shutdown' ]

I get:

/var/log/xen/xl-tv.log:

Waiting for domain tv (domid 18) to die [pid 18587]
Domain 18 has shut down, reason code 0 0x0
Action for shutdown reason code 0 is destroy
Domain 18 needs to be cleaned up: destroying the domain
libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission irq=40: 
Invalid argument
libxl: error: libxl_pci.c:1012:libxl__device_pci_reset: ?!?!? Me here
Done. Exiting now

/var/log/xen/qemu-dm-tv.log:

char device redirected to /dev/pts/21 (label serial0)
VNC server running on `127.0.0.1:5901'
xen be: vkbd-0: initialise() failed
xen be: vkbd-0: initialise() failed
xen be: vkbd-0: initialise() failed
Issued domain 18 poweroff
qemu: terminating on signal 1 from pid 18587


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 31886: regressions - FAIL

2014-11-28 Thread xen . org
flight 31886 qemu-mainline real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31886/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-amd64  5 xen-boot fail REGR. vs. 31855

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31855

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass

version targeted for testing:
 qemuu490309fcfbed9fa1ed357541f609975016a34628
baseline version:
 qemuuca6028185d19d3f2bd331c15175c3ef5afc30c77


People who touched revisions under test:
  Christian Borntraeger 
  Don Slutz 
  Eric Blake 
  Gerd Hoffmann 
  Gonglei 
  Markus Armbruster 
  Paolo Bonzini 
  Peter Maydell 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64fail
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt   

Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough

2014-11-28 Thread Stefano Stabellini
On Fri, 28 Nov 2014, Sander Eikelenboom wrote:
> Friday, November 28, 2014, 6:09:52 PM, you wrote:
> 
> > On Fri, 28 Nov 2014, Wei Liu wrote:
> >> On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote:
> >> > On Fri, 28 Nov 2014, Wei Liu wrote:
> >> > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote:
> >> > > > create ^
> >> > > > title it QMP connection problems prevent libxl from calling 
> >> > > > libxl__device_pci_reset on domain shutdown
> >> > > > thanks
> >> > > > 
> >> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> >> > > > > Hi,
> >> > > > > 
> >> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: 
> >> > > > > libxl__device_pci_reset()"
> >> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) 
> >> > > > > with pci passthrough.
> >> > > > > xl didn't show any problems on the commandline so i never drawed 
> >> > > > > much attention 
> >> > > > > to it, but /var/log/xen/xl-guest.log shows:
> >> > > > > 
> >> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> >> > > > > Domain 19 has shut down, reason code 0 0x0
> >> > > > > Action for shutdown reason code 0 is destroy
> >> > > > > Domain 19 needs to be cleaned up: destroying the domain
> >> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: 
> >> > > > > Connection reset by peer
> >> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to 
> >> > > > > connect to QMP
> >> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: 
> >> > > > > libxl__qmp_pci_del: Connection reset by peer
> >> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model 
> >> > > > > already exited
> >> > > > > Done. Exiting now
> >> > > > > 
> >> > > > > So it doesn't even get to calling  "libxl_pci.c: 
> >> > > > > libxl__device_pci_reset()".
> >> > > > > 
> >> > > > > --
> >> > > 
> >> > > It's worth checking whether the device model exits too early, i.e., did
> >> > > it crash? What's in the DM log?
> >> > 
> >> > Firstly I was thinking that if force=1 it makes sense to continue even
> >> > when QEMU returns error, see:
> >> > 
> >> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com
> >> 
> >> In normal case DM is not destroyed until PCI device is removed. So I
> >> think the real issue is DM crashed.
> 
> > I don't think it crashed: QEMU detects that domain shutdown has been
> > called and exits.
> > See:
> 
> > static void main_loop(void)
> > {
> > bool nonblocking;
> > int last_io = 0;
> > #ifdef CONFIG_PROFILER
> > int64_t ti;
> > #endif
> > do {
> 
> > [...]
> 
> > } while (!main_loop_should_exit());
> > }
> 
> 
> > main_loop_should_exit returns true when qemu_shutdown_requested().
> 
> 
> >> libxl.c:
> >> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0)
> >> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for 
> >> domid %d", domid);
> >> 1573 rc = xc_domain_pause(ctx->xch, domid);
> >> 1574 if (rc < 0) {
> >> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, 
> >> "xc_domain_pause failed for %d", domid);
> >> 1576 }
> >> 1577 if (dm_present) {
> >> 1578 if (libxl__destroy_device_model(gc, domid) < 0)
> >> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, 
> >> "libxl__destroy_device_model failed for %d", domid);
> >> 1580
> >> 1581 libxl__qmp_cleanup(gc, domid);
> >> 1582 }
> >> 
> >> The patch you posted covers up the real issue.
> 
> > That is true, but I think the patch is correct regardless of the issue.
> 
> Hmm how about qemu's "-no-shutdown" option ?
> >From the manpage:
> -no-shutdown
> Don't exit QEMU on guest shutdown, but instead only stop the emulation. 
> This allows for instance switching to monitor to commit changes to the disk 
> image. 
> 
> with:
> device_model_args_hvm = [ '-no-shutdown' ]
> 
> I get:
> 
> /var/log/xen/xl-tv.log:
> 
> Waiting for domain tv (domid 18) to die [pid 18587]
> Domain 18 has shut down, reason code 0 0x0
> Action for shutdown reason code 0 is destroy
> Domain 18 needs to be cleaned up: destroying the domain
> libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission 
> irq=40: Invalid argument

Thanks for testing! The suggestion might be a good idea, but I am not
sure whether it is suitable given the stage of the release cycle. I
would wait until 4.6 before making that change.

Regarding the xc_domain_irq_permission error, it is just a matter of
calling xc_physdev_unmap_pirq after xc_domain_irq_permission:


diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 316643c..904d255 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1288,14 +1290,14 @@ skip1:
 goto out;
 }
 if ((fscanf(f, "%u", &irq) == 1) && irq) {
-rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
-if (rc < 0) {
-LIBXL__LOG_ERRNO

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-28 Thread Luis R. Rodriguez
On Thu, Nov 27, 2014 at 06:50:31PM +, Andrew Cooper wrote:
> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> > On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>> From: "Luis R. Rodriguez" 
> >>>
> >>> Some folks had reported that some xen hypercalls take a long time
> >>> to complete when issued from the userspace private ioctl mechanism,
> >>> this can happen for instance with some hypercalls that have many
> >>> sub-operations, this can happen for instance on hypercalls that use
> >>> multi-call feature whereby Xen lets one hypercall batch out a series
> >>> of other hypercalls on the hypervisor. At times such hypercalls can
> >>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
> >>> 120 seconds), this a non-issue issue on preemptible kernels though as
> >>> the kernel may deschedule such long running tasks. Xen for instance
> >>> supports multicalls to be preempted as well, this is what Xen calls
> >>> continuation (see xen commit 42217cbc5b which introduced this [0]).
> >>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
> >>> or no preemption -- a long running hypercall will not be descheduled
> >>> until the hypercall is complete and the ioctl returns to user space.
> >>>
> >>> To help with this David had originally implemented support for use
> >>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
> >>> solution never went upstream though and upon review to help refactor
> >>> this I've concluded that usage of preempt_schedule_irq() would be
> >>> a bit abussive of existing APIs -- for a few reasons:
> >>>
> >>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
> >>>
> >>> 1) we want try to consider solutions that might work for other
> >>> hypervisors for this same problem, and identify it its an issue
> >>> even present on other hypervisors or if this is a self
> >>> inflicted architectural issue caused by use of multicalls
> >>>
> >>> 2) there is no documentation or profiling of the exact hypercalls
> >>> that were causing these issues, nor do we have any context
> >>> to help evaluate this any further
> >>>
> >>> I at least checked with kvm folks and it seems hypercall preemption
> >>> is not needed there. We can survey other hypervisors...
> >>>
> >>> If 'something like preemption' is needed then CONFIG_PREEMPT
> >>> should just be enabled and encouraged, it seems we want to
> >>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
> >>> used. In the meantime this tries to address a solution to help
> >>> xen on non CONFIG_PREEMPT kernels.
> >>>
> >>> One option tested and evaluated was to put private hypercalls in
> >>> process context, however this would introduce complexities such
> >>> originating hypercalls from different contexts. Current xen
> >>> hypercall callback handlers would need to be changed per architecture,
> >>> for instance, we'd also incur the cost of switching states from
> >>> user / kernel (this cost is also present if preempt_schedule_irq()
> >>> is used). There may be other issues which could be introduced with
> >>> this strategy as well. The simplest *shared* alternative is instead
> >>> to just explicitly schedule() at the end of a private hypercall on non
> >>> preempt kernels. This forces our private hypercall call mechanism
> >>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
> >>> more context switch but keeps the private hypercall context intact.
> >>>
> >>> [0] 
> >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
> >>> [1] 
> >>> http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
> >>>
> >>> Cc: Davidlohr Bueso 
> >>> Cc: Joerg Roedel 
> >>> Cc: Borislav Petkov 
> >>> Cc: Konrad Rzeszutek Wilk 
> >>> Cc: Jan Beulich 
> >>> Cc: Juergen Gross 
> >>> Cc: Olaf Hering 
> >>> Cc: David Vrabel 
> >>> Signed-off-by: Luis R. Rodriguez 
> >>> ---
> >>>   drivers/xen/privcmd.c | 3 +++
> >>>   1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> >>> index 569a13b..e29edba 100644
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>>  hypercall.arg[0], hypercall.arg[1],
> >>>  hypercall.arg[2], hypercall.arg[3],
> >>>  hypercall.arg[4]);
> >>> +#ifndef CONFIG_PREEMPT
> >>> + schedule();
> >>> +#endif
> >>>
> >>>   return ret;
> >>>   }
> >>>
> >> Sorry, I don't think this will solve anything. You're calling schedule()
> >> right after the long running hypercall just nanoseconds before returning
> >> to the user.
> > Yeah, well that is what [1] tried as well only it tried usi

  1   2   >