Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/27/2014 01:27 AM, Jan Beulich wrote: This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). Ah, thanks for the explanation. So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. Yes, latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Now I did get a hang with max_cstate=3 and mwait-idle=0. May I assume that mwait-idle=0 means that ACPI is responsible for the throttling? Thanks again for all your help! Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug
On Fri, 2014-11-28 at 00:28 +, M A Young wrote: > Migrations with xl migrate --debug will fail because debugging information > from the receiving process is written to the stdout channel. This channel > is also used for status messages so the migration will fail as the sending > process receives an unexpected message. This patch moves the debugging > information to the stderr channel. > > Version 2 moves some output back to stdout that was accidentally moved > to stderr in the first version. > > Signed-off-by: Michael Young I think you've forgotten the attachment. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm
XenGT (Intel Graphics Virtualization technology, please refer to https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization- xengt) driver runs inside Dom0 as a virtual graphics device model, and needs to trap and emulate the guest's write operations to some specific memory pages, like memory pages used by guest graphics driver as PPGTT(per-process graphics translation table). We agreed to add a new p2m type "p2m_mmio_write_dm" to trap the write operation on these graphic page tables. Handling of this new p2m type are similar with existing p2m_ram_ro in most condition checks, with only difference on final policy of emulation vs. drop. For p2m_mmio_write_dm type pages, writes will go to the device model via ioreq-server. Previously, the conclusion in our v3 patch review is to provide a more generalized HVMOP_map_io_range_to_ioreq_server hypercall, by seperating rangesets inside a ioreq server to read-protected/write- protected/both-prtected. Yet, after offline discussion with Paul, we believe a more simplified solution may suffice. We can keep the existing HVMOP_map_io_range_to_ioreq_server hypercall, and let the user decide whether or not a p2m type change is necessary, because in most cases the emulator will already use the p2m_mmio_dm type. Changes from v3: - Use the existing HVMOP_map_io_range_to_ioreq_server hypercall to add write protected range. - Modify the HVMOP_set_mem_type hypercall to support the new p2m type for this range Changes from v2: - Remove excute attribute of the new p2m type p2m_mmio_write_dm - Use existing rangeset for keeping the write protection page range instead of introducing hash table. - Some code style fix. Changes from v1: - Changes the new p2m type name from p2m_ram_wp to p2m_mmio_write_dm. This means that we treat the pages as a special mmio range instead of ram. - Move macros to c file since only this file is using them. - Address various comments from Jan. Signed-off-by: Yu Zhang Signed-off-by: Wei Ye --- xen/arch/x86/hvm/hvm.c | 13 ++--- xen/arch/x86/mm/p2m-ept.c | 1 + xen/arch/x86/mm/p2m-pt.c| 1 + xen/include/asm-x86/p2m.h | 1 + xen/include/public/hvm/hvm_op.h | 1 + 5 files changed, 14 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8f49b44..5f806e8 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, * to the mmio handler. */ if ( (p2mt == p2m_mmio_dm) || - (npfec.write_access && (p2mt == p2m_ram_ro)) ) + (npfec.write_access && + ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) ) { put_gfn(p2m->domain, gfn); @@ -5922,6 +5923,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) a.mem_type = HVMMEM_ram_rw; else if ( p2m_is_grant(t) ) a.mem_type = HVMMEM_ram_rw; +else if ( t == p2m_mmio_write_dm ) +a.mem_type = HVMMEM_mmio_write_dm; else a.mem_type = HVMMEM_mmio_dm; rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0; @@ -5941,7 +5944,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) static const p2m_type_t memtype[] = { [HVMMEM_ram_rw] = p2m_ram_rw, [HVMMEM_ram_ro] = p2m_ram_ro, -[HVMMEM_mmio_dm] = p2m_mmio_dm +[HVMMEM_mmio_dm] = p2m_mmio_dm, +[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm }; if ( copy_from_guest(&a, arg, 1) ) @@ -5987,14 +5991,17 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) rc = -EAGAIN; goto param_fail4; } + if ( !p2m_is_ram(t) && - (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) ) + (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) && + t != p2m_mmio_write_dm ) { put_gfn(d, pfn); goto param_fail4; } rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]); + put_gfn(d, pfn); if ( rc ) goto param_fail4; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 15c6e83..e21a92d 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces entry->x = 0; break; case p2m_grant_map_ro: +case p2m_mmio_write_dm: entry->r = 1; entry->w = entry->x = 0; break; diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index e48b63a..26fb18d 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -94,6 +94,7 @@ static unsigned long p2m_type_t
Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug
On Fri, 28 Nov 2014, Ian Campbell wrote: On Fri, 2014-11-28 at 00:28 +, M A Young wrote: Migrations with xl migrate --debug will fail because debugging information from the receiving process is written to the stdout channel. This channel is also used for status messages so the migration will fail as the sending process receives an unexpected message. This patch moves the debugging information to the stderr channel. Version 2 moves some output back to stdout that was accidentally moved to stderr in the first version. Signed-off-by: Michael Young I think you've forgotten the attachment. Yes, I did. Here it is.Use stderr for debugging messages for xl migrate --debug as stdout is used for status messages. --- xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c.orig 2014-10-24 15:22:40.0 +0100 +++ xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c 2014-11-26 22:41:41.697043321 + @@ -380,10 +380,10 @@ } static void printf_info(enum output_format output_format, int domid, -libxl_domain_config *d_config) +libxl_domain_config *d_config, FILE *fh) { if (output_format == OUTPUT_FORMAT_SXP) -return printf_info_sexp(domid, d_config); +return printf_info_sexp(domid, d_config, fh); const char *buf; libxl_yajl_length len = 0; @@ -404,7 +404,7 @@ if (s != yajl_gen_status_ok) goto out; -puts(buf); +fputs(buf, fh); out: yajl_gen_free(hand); @@ -413,7 +413,13 @@ fprintf(stderr, "unable to format domain config as JSON (YAJL:%d)\n", s); -if (ferror(stdout) || fflush(stdout)) { perror("stdout"); exit(-1); } +if (ferror(fh) || fflush(fh)) { +if (fh == stdout) +perror("stdout"); +else +perror("stderr"); +exit(-1); +} } static int do_daemonize(char *name) @@ -2423,7 +2429,7 @@ } if (!dom_info->quiet) -printf("Parsing config from %s\n", config_source); +fprintf(stderr, "Parsing config from %s\n", config_source); if (config_in_json) { libxl_domain_config_from_json(ctx, &d_config, @@ -2451,7 +2457,7 @@ } if (debug || dom_info->dryrun) -printf_info(default_output_format, -1, &d_config); +printf_info(default_output_format, -1, &d_config, stderr); ret = 0; if (dom_info->dryrun) @@ -3403,7 +3409,7 @@ if (default_output_format == OUTPUT_FORMAT_JSON) s = printf_info_one_json(hand, info[i].domid, &d_config); else -printf_info_sexp(info[i].domid, &d_config); +printf_info_sexp(info[i].domid, &d_config, stdout); libxl_domain_config_dispose(&d_config); if (s != yajl_gen_status_ok) goto out; @@ -4725,7 +4731,7 @@ parse_config_data(filename, config_data, config_len, &d_config); if (debug || dryrun_only) -printf_info(default_output_format, -1, &d_config); +printf_info(default_output_format, -1, &d_config, stdout); if (!dryrun_only) { fprintf(stderr, "setting dom%d configuration\n", domid); --- xen-4.5.0-rc1/tools/libxl/xl_sxp.c.orig 2014-10-24 15:22:40.0 +0100 +++ xen-4.5.0-rc1/tools/libxl/xl_sxp.c 2014-11-26 22:30:58.416394082 + @@ -30,7 +30,7 @@ /* In general you should not add new output to this function since it * is intended only for legacy use. */ -void printf_info_sexp(int domid, libxl_domain_config *d_config) +void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh) { int i; libxl_dominfo info; @@ -38,195 +38,195 @@ libxl_domain_create_info *c_info = &d_config->c_info; libxl_domain_build_info *b_info = &d_config->b_info; -printf("(domain\n\t(domid %d)\n", domid); -printf("\t(create_info)\n"); -printf("\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM); -printf("\t(hap %s)\n", libxl_defbool_to_string(c_info->hap)); -printf("\t(oos %s)\n", libxl_defbool_to_string(c_info->oos)); -printf("\t(ssidref %d)\n", c_info->ssidref); -printf("\t(name %s)\n", c_info->name); +fprintf(fh, "(domain\n\t(domid %d)\n", domid); +fprintf(fh, "\t(create_info)\n"); +fprintf(fh, "\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM); +fprintf(fh, "\t(hap %s)\n", libxl_defbool_to_string(c_info->hap)); +fprintf(fh, "\t(oos %s)\n", libxl_defbool_to_string(c_info->oos)); +fprintf(fh, "\t(ssidref %d)\n", c_info->ssidref); +fprintf(fh, "\t(name %s)\n", c_info->name); /* retrieve the UUID from dominfo, since it is probably generated * during parsing and thus does not match the real one */ if (libxl_domain_info(ctx, &info, domid) == 0) { -printf("\t(uuid " LIBXL_UUID_FMT ")\n", LIBXL_UUID_BYTES(info.uuid)); +fprintf(fh, "\t(uuid " LIBXL_UUID_FMT ")\n", LIBXL_UUID_BYTES(info.uuid)); } else { -printf("\t(uuid )\n"); +fprintf(fh, "\t(uuid )\n");
Re: [Xen-devel] Regression, host crash with 4.5rc1
>>> On 28.11.14 at 09:24, wrote: >> And with 6 errata >> documented it's not all that unlikely that there's a 7th one with >> MONITOR/MWAIT behavior. The commit you bisected to (and >> which you had verified to be the culprit by just forcing >> arch_skip_send_event_check() to always return false) could be >> reasonably assumed to be broken only when MWAIT use for all >> C states didn't work. > > Now I did get a hang with max_cstate=3 and mwait-idle=0. According to the data you provided earlier, max_cstate=3 is identical to not using that option at all when you also use mwait-idle=0. It would make a difference only when not using that latter option (and I specifically pointed this out in earlier replies). > May I assume > that mwait-idle=0 means that ACPI is responsible for the throttling? ACPI is providing the data to do C state management in that case, but it's still Xen doing things. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass
flight 31882 xen-4.4-testing real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/31882/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 31781 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-sedf-pin 3 host-install(3) broken REGR. vs. 31781 Tests which did not succeed, but are not blocking: test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-libvirt 9 guest-start fail never pass test-armhf-armhf-libvirt 9 guest-start fail never pass test-amd64-amd64-libvirt 9 guest-start fail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 17 leak-check/check fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen a39f202031d7f1d8d9e14b8c3d7d11c812db253e baseline version: xen 7679aeb444ed3bc4de0f473c16c47eab7d2f9d33 People who touched revisions under test: Jan Beulich jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64
Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest
>>> On 28.11.14 at 04:28, wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -4287,7 +4287,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, > unsigned int *ebx, > !host_tsc_is_safe() ) > *edx &= ~cpufeat_mask(X86_FEATURE_RDTSCP); > /* Hide 1GB-superpage feature if we can't emulate it. */ > -if (!hvm_pse1gb_supported(d)) > +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d)) > *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB); With #define hvm_pse1gb_supported(d) \ (cpu_has_page1gb && paging_mode_hap(d)) the change above is pointless. While, considering this, comments on v2 may have been misleading, you should have simply updated the patch description instead to clarify why the v2 change was okay even for the shadow mode case. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass
>>> On 28.11.14 at 10:07, wrote: > flight 31882 xen-4.4-testing real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/31882/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. > 31781 Wasn't the swiotlb problem supposed to be dealt with by now? swiotlb=65536 looks to be in place here, yet that still appears to not be big enough... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On Nov 28, 2014, at 00:50, Jan Beulich wrote: On 28.11.14 at 09:24, wrote: >>> And with 6 errata >>> documented it's not all that unlikely that there's a 7th one with >>> MONITOR/MWAIT behavior. The commit you bisected to (and >>> which you had verified to be the culprit by just forcing >>> arch_skip_send_event_check() to always return false) could be >>> reasonably assumed to be broken only when MWAIT use for all >>> C states didn't work. >> >> Now I did get a hang with max_cstate=3 and mwait-idle=0. > > According to the data you provided earlier, max_cstate=3 is > identical to not using that option at all when you also use > mwait-idle=0. It would make a difference only when not using > that latter option (and I specifically pointed this out in earlier > replies). > Apologies for asking you to repeat yourself. Most of this stuff is over my head -- the only time I was this far down the rabbit hole was on an 8051. Thanks for your patience. :-) Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm
>>> On 28.11.14 at 08:59, wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned > long gla, > * to the mmio handler. > */ > if ( (p2mt == p2m_mmio_dm) || > - (npfec.write_access && (p2mt == p2m_ram_ro)) ) > + (npfec.write_access && > + ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) ) Why are you corrupting indentation here? Furthermore the code you modify here suggests that p2m_ram_ro already has the needed semantics - writes get passed to the DM. None of the other changes you make, and none of the other uses of p2m_ram_ro appear to be in conflict with your intentions, so you'd really need to explain better why you need the new type. > @@ -5922,6 +5923,8 @@ long do_hvm_op(unsigned long op, > XEN_GUEST_HANDLE_PARAM(void) arg) > a.mem_type = HVMMEM_ram_rw; > else if ( p2m_is_grant(t) ) > a.mem_type = HVMMEM_ram_rw; > +else if ( t == p2m_mmio_write_dm ) > +a.mem_type = HVMMEM_mmio_write_dm; > else > a.mem_type = HVMMEM_mmio_dm; > rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0; > @@ -5941,7 +5944,8 @@ long do_hvm_op(unsigned long op, > XEN_GUEST_HANDLE_PARAM(void) arg) > static const p2m_type_t memtype[] = { > [HVMMEM_ram_rw] = p2m_ram_rw, > [HVMMEM_ram_ro] = p2m_ram_ro, > -[HVMMEM_mmio_dm] = p2m_mmio_dm > +[HVMMEM_mmio_dm] = p2m_mmio_dm, > +[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm > }; > > if ( copy_from_guest(&a, arg, 1) ) > @@ -5987,14 +5991,17 @@ long do_hvm_op(unsigned long op, > XEN_GUEST_HANDLE_PARAM(void) arg) > rc = -EAGAIN; > goto param_fail4; > } > + Stray addition of a blank line? > if ( !p2m_is_ram(t) && > - (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) ) > + (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) && > + t != p2m_mmio_write_dm ) Do you really want to permit e.g. transitions between mmio_dm and mmio_write_dm? We should be as restrictive as possible here to not open up paths to security problems. > { > put_gfn(d, pfn); > goto param_fail4; > } > > rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]); > + > put_gfn(d, pfn); > if ( rc ) > goto param_fail4; Another stray newline addition. > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -72,6 +72,7 @@ typedef enum { > p2m_ram_shared = 12, /* Shared or sharable memory */ > p2m_ram_broken = 13, /* Broken page, access cause domain crash > */ > p2m_map_foreign = 14,/* ram pages from foreign domain */ > +p2m_mmio_write_dm = 15, /* Read-only; writes go to the device > model */ > } p2m_type_t; > > /* Modifiers to the query */ > If the new type is really needed, shouldn't this get added to P2M_RO_TYPES? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xenstore.h clarifications
On Thu, 2014-11-27 at 12:51 +0200, Razvan Cojocaru wrote: It's a good idea to CC the relevant maintainers if you want their input. > Hello, > > I know that xc_interface_open() can be safely called several times from > the same process, and that several processes can each have a bunch of > xc_interface handles open, and that I shouldn't use an xc_interface > inherited from the parent in a child process, because xenctrl.h says so. > > Is it safe to assume that the same restrictions / conventions apply to > xs_handles obtained via xs_open()? Xenstore.h is not explicit. Looking > at the code, it would seem safe to assume that it can be used in a > similar manner, but it would be nice to have this confirmed if possible. I think there's a pretty good chance that the same applies to xenstore connections made over the device/shared-ring interface. I'm not really sure about the semantics of a Unix domain socket after a fork, but I don't expect both parent and child could sanely make use of it. So I think the answer is: * Connections made with xs_open(0) (which might be shared page or socket based) are only guaranteed to work in the parent after fork. * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable in either the parent or the child after fork, but not both. * xs_daemon_open*() and xs_domain_open() are deprecated synonyms for xs_open(0) * XS_OPEN_READONLY has not bearing on any of this. Ian, does that seem right? Razvan, assuming Ian concurs with the above (or corrects it) then could you knock up a patch to document the result please. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xenstore.h clarifications
On 11/28/2014 11:58 AM, Ian Campbell wrote: > On Thu, 2014-11-27 at 12:51 +0200, Razvan Cojocaru wrote: > > It's a good idea to CC the relevant maintainers if you want their input. > >> Hello, >> >> I know that xc_interface_open() can be safely called several times from >> the same process, and that several processes can each have a bunch of >> xc_interface handles open, and that I shouldn't use an xc_interface >> inherited from the parent in a child process, because xenctrl.h says so. >> >> Is it safe to assume that the same restrictions / conventions apply to >> xs_handles obtained via xs_open()? Xenstore.h is not explicit. Looking >> at the code, it would seem safe to assume that it can be used in a >> similar manner, but it would be nice to have this confirmed if possible. > > I think there's a pretty good chance that the same applies to xenstore > connections made over the device/shared-ring interface. > > I'm not really sure about the semantics of a Unix domain socket after a > fork, but I don't expect both parent and child could sanely make use of > it. > > So I think the answer is: > > * Connections made with xs_open(0) (which might be shared page or > socket based) are only guaranteed to work in the parent after > fork. > * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable > in either the parent or the child after fork, but not both. > * xs_daemon_open*() and xs_domain_open() are deprecated synonyms > for xs_open(0) > * XS_OPEN_READONLY has not bearing on any of this. > > Ian, does that seem right? > > Razvan, assuming Ian concurs with the above (or corrects it) then could > you knock up a patch to document the result please. Sure, I'll document whatever gets confirmed. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] missing chunk of HVM direct kernel boot patch
On Fri, 28 Nov 2014, Chunyan Liu wrote: > Found by Stefano, this chunk of the patch was never applied to > xen-unstable (commit 11dffa2359e8a2629490c14c029c7c7c777b3e47), > see http://marc.info/?l=qemu-devel&m=140471493425353&w=2. > > Signed-off-by: Chunyan Liu I think it should be in 4.5: at the moment the feature is broken and only half-applied to the tree. Reviewed-by: Stefano Stabellini > tools/libxl/libxl_dm.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 3e191c3..b25b574 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -527,6 +527,15 @@ static char ** > libxl__build_device_model_args_new(libxl__gc *gc, > if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) { > int ioemu_nics = 0; > > +if (b_info->kernel) > +flexarray_vappend(dm_args, "-kernel", b_info->kernel, NULL); > + > +if (b_info->ramdisk) > +flexarray_vappend(dm_args, "-initrd", b_info->ramdisk, NULL); > + > +if (b_info->cmdline) > +flexarray_vappend(dm_args, "-append", b_info->cmdline, NULL); > + > if (b_info->u.hvm.serial || b_info->u.hvm.serial_list) { > if ( b_info->u.hvm.serial && b_info->u.hvm.serial_list ) > { > -- > 1.8.4.5 > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about network performance difference between dom0 and host
Hi, I would check whether GRO is enabled under Dom0 or not (ethtool -k ethX). Comparing top during test (especially that which context use how many percentage), and the number of interrupts per second (grep eth /proc/interrupts) would be interesting too. Regards, Zoltan On 28/11/14 02:29, zhangleiqiang wrote: I have done some network performance testing under XEN, and found the result which is somehow strange: When testing between Hosts (Sender and receiver servers are both booting from kernel-default), running only one netperf process will be enough to reach the upper limit of NIC. However, when testing between Dom0s (Sender and receiver servers are both booting from kernel-xen), we must run more count of netperf process to reach the upper limit of NIC, running only one netperf process the throughout was low. The results using one netperf process are as follows: TCP 512TCP 1460 TCP 4096TCP 8500 Hosts:9134.249444.84 9448.05 9447.58(Mbs) Dom0s:2063.93018.95 6561.29 5008.17(Mbs) The question is why one netperf process cannot reach the max throughout of NIC under Dom0 ? I know that Dom0 will have extra overhead when handling interrupt which must be handled by hypervisor first, but I think it is not the reason for it. The testing environment details are as follows: 1. Hardware a. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, 2 CPU 6 Cores with Hyper Thread enabled b. NIC: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) 2. Sofware: a. HostOS: SUSE SLES 11 SP3 (Kernel 3.0.76) b. NIC Driver: IXGBE 3.21.2 c. OVS: 2.1.3 d. MTU: 1600 e. Dom0:6U6G 3. Networking Environment: a. All network flows are transmit/receive through OVS b. Sender server and receiver server are connected directly between 10GE NIC 4. Testing Tools: a. Sender: netperf b. Receiver: netserver -- zhangleiqiang (Trump) Best Regards ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest
>> -if (!hvm_pse1gb_supported(d)) >> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d)) >> *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB); > >With > >#define hvm_pse1gb_supported(d) \ >(cpu_has_page1gb && paging_mode_hap(d)) >the change above is pointless. While, considering this, comments on >v2 may have been misleading, you should have simply updated the patch >description instead to clarify why the v2 change was okay even for the shadow >mode case. I checked the code and found that for a normal guest can only be in hap mode or shadow mode before I sending the patch, but I am not share if it's true for dom0. Liang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest
On 28/11/14 10:29, Li, Liang Z wrote: >>> -if (!hvm_pse1gb_supported(d)) >>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d)) >>> *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB); >> With >> >> #define hvm_pse1gb_supported(d) \ >>(cpu_has_page1gb && paging_mode_hap(d)) >> the change above is pointless. While, considering this, comments on >> v2 may have been misleading, you should have simply updated the patch >> description instead to clarify why the v2 change was okay even for the >> shadow mode case. > I checked the code and found that for a normal guest can only be in hap mode > or shadow mode before I sending the patch, but I am not share if it's true > for dom0. > > Liang Dom0 may either be PV (in which case neither hap nor shadow, and cant use 1GB pages anyway), or experimentally PVH which is currently restricted to hap. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest
>>>(cpu_has_page1gb && paging_mode_hap(d)) the change above is >>> pointless. While, considering this, comments on >>> v2 may have been misleading, you should have simply updated the patch >>> description instead to clarify why the v2 change was okay even for the >>> shadow mode case. >> I checked the code and found that for a normal guest can only be in hap mode >> or shadow mode before I sending the patch, but I am not share if it's true >> for dom0. >> >> Liang > > Dom0 may either be PV (in which case neither hap nor shadow, and cant use 1GB > pages anyway), or experimentally PVH which is currently restricted to hap. > >~Andrew Thanks for clarification. I will resend the v2 patch. Liang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest
>>> On 28.11.14 at 11:29, wrote: >> > -if (!hvm_pse1gb_supported(d)) >>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d)) >>> *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB); >> >>With >> >>#define hvm_pse1gb_supported(d) \ >>(cpu_has_page1gb && paging_mode_hap(d)) > >>the change above is pointless. While, considering this, comments on >>v2 may have been misleading, you should have simply updated the patch > description instead to clarify why the v2 change was okay even for the shadow > mode case. > > I checked the code and found that for a normal guest can only be in hap mode > or shadow mode before I sending the patch, but I am not share if it's true > for dom0. The CPUID code in libxc doesn't apply to Dom0 at all, and CPUID handling is also special cased in the hypervisor for Dom0. Plus finally Dom0 only possibly being PV or PVH, PVH requiring HAP and PV generally not allowing large pages anyway, your concern is unnecessary. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 08/10] xen: Hide get_phys_to_machine() to be able to tune common path
Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by: Juergen Gross Reviewed-by: David Vrabel --- arch/x86/include/asm/xen/page.h | 26 -- arch/x86/xen/mmu.c | 2 +- arch/x86/xen/p2m.c | 6 +++--- 3 files changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 28fa795..410a6ec 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -59,6 +59,21 @@ extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops, struct page **pages, unsigned int count); extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn); +/* + * When to use pfn_to_mfn(), __pfn_to_mfn() or get_phys_to_machine(): + * - pfn_to_mfn() returns either INVALID_P2M_ENTRY or the mfn. No indicator + * bits (identity or foreign) are set. + * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set + * identity or foreign indicator will be still set. __pfn_to_mfn() is + * encapsulating get_phys_to_machine(). + * - get_phys_to_machine() is to be called by __pfn_to_mfn() only to allow + * for future optimizations. + */ +static inline unsigned long __pfn_to_mfn(unsigned long pfn) +{ + return get_phys_to_machine(pfn); +} + static inline unsigned long pfn_to_mfn(unsigned long pfn) { unsigned long mfn; @@ -66,7 +81,7 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn) if (xen_feature(XENFEAT_auto_translated_physmap)) return pfn; - mfn = get_phys_to_machine(pfn); + mfn = __pfn_to_mfn(pfn); if (mfn != INVALID_P2M_ENTRY) mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT); @@ -79,7 +94,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn) if (xen_feature(XENFEAT_auto_translated_physmap)) return 1; - return get_phys_to_machine(pfn) != INVALID_P2M_ENTRY; + return __pfn_to_mfn(pfn) != INVALID_P2M_ENTRY; } static inline unsigned long mfn_to_pfn_no_overrides(unsigned long mfn) @@ -113,7 +128,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn) return mfn; pfn = mfn_to_pfn_no_overrides(mfn); - if (get_phys_to_machine(pfn) != mfn) { + if (__pfn_to_mfn(pfn) != mfn) { /* * If this appears to be a foreign mfn (because the pfn * doesn't map back to the mfn), then check the local override @@ -129,8 +144,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn) * entry doesn't map back to the mfn and m2p_override doesn't have a * valid entry for it. */ - if (pfn == ~0 && - get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn)) + if (pfn == ~0 && __pfn_to_mfn(mfn) == IDENTITY_FRAME(mfn)) pfn = mfn; return pfn; @@ -176,7 +190,7 @@ static inline unsigned long mfn_to_local_pfn(unsigned long mfn) return mfn; pfn = mfn_to_pfn(mfn); - if (get_phys_to_machine(pfn) != mfn) + if (__pfn_to_mfn(pfn) != mfn) return -1; /* force !pfn_valid() */ return pfn; } diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 601914d..3e3f8f8 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -387,7 +387,7 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) unsigned long mfn; if (!xen_feature(XENFEAT_auto_translated_physmap)) - mfn = get_phys_to_machine(pfn); + mfn = __pfn_to_mfn(pfn); else mfn = pfn; /* diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index eddec40..8c3d8fb 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -787,7 +787,7 @@ static int m2p_add_override(unsigned long mfn, struct page *page, * because mfn_to_pfn (that ends up being called by GUPF) will * return the backend pfn rather than the frontend pfn. */ pfn = mfn_to_pfn_no_overrides(mfn); - if (get_phys_to_machine(pfn) == mfn) + if (__pfn_to_mfn(pfn) == mfn) set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)); return 0; @@ -967,7 +967,7 @@ static int m2p_remove_override(struct page *page, * pfn again. */ mfn &= ~FOREIGN_FRAME_BIT; pfn = mfn_to_pfn_no_overrides(mfn); - if (get_phys_to_machine(pfn) == FOREIGN_FRAME(mfn) && + if (__pfn_to_mfn(pfn) == FOREIGN_FRAME(mfn) && m2p_find_override(mfn) == NULL) set_phys_to_machine(pfn, mfn); @@ -992,7 +992,7 @@ int clear_foreign_p2m
[Xen-devel] [PATCH V4 00/10] xen: Switch to virtual mapped linear p2m list
Paravirtualized kernels running on Xen use a three level tree for translation of guest specific physical addresses to machine global addresses. This p2m tree is used for construction of page table entries, so the p2m tree walk is performance critical. By using a linear virtual mapped p2m list accesses to p2m elements can be sped up while even simplifying code. To achieve this goal some p2m related initializations have to be performed later in the boot process, as the final p2m list can be set up only after basic memory management functions are available. Changes in V4: - carved out style corrections into new patch 1 as requested by David Vrabel - minor modification in patch 5 (former patch 3) as suggested by David Vrabel - carved out change of page allocation scheme from p2m.c into own patch as requested by Konrad Wilk, corrected for 32 bit as requested by Andrew Cooper - added several comments - simplified error handling in patch 4 (former patch 2) Changes in V3: - Carved out (new) patch 1 to make pure code movement more obvious as requested by David Vrabel - New patch 6 introducing __pfn_to_mfn() (taken from patch 7) as requested by David Vrabel - New patch 8 to speed up set_phys_to_machine() as suggested by David Vrabel Changes in V2: - splitted patch 2 in 4 smaller ones as requested by David Vrabel - added highmem check when remapping kernel memory as requested by David Vrabel Juergen Gross (10): xen: fix some style issues in p2m.c xen: Make functions static xen: use common page allocation function in p2m.c xen: Delay remapping memory of pv-domain xen: Delay m2p_override initialization xen: Delay invalidating extra memory x86: Introduce function to get pmd entry pointer xen: Hide get_phys_to_machine() to be able to tune common path xen: switch to linear virtual mapped sparse p2m list xen: Speed up set_phys_to_machine() by using read-only mappings arch/x86/include/asm/pgtable_types.h |1 + arch/x86/include/asm/xen/page.h | 48 +- arch/x86/mm/pageattr.c | 20 + arch/x86/xen/mmu.c | 38 +- arch/x86/xen/p2m.c | 1320 ++ arch/x86/xen/setup.c | 443 ++-- arch/x86/xen/xen-ops.h |6 +- 7 files changed, 838 insertions(+), 1038 deletions(-) -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 03/10] xen: use common page allocation function in p2m.c
In arch/x86/xen/p2m.c three different allocation functions for obtaining a memory page are used: extend_brk(), alloc_bootmem_align() or __get_free_page(). Which of those functions is used depends on the progress of the boot process of the system. Introduce a common allocation routine selecting the to be called allocation routine dynamically based on the boot progress. This allows moving initialization steps without having to care about changing allocation calls. Signed-off-by: Juergen Gross --- arch/x86/xen/mmu.c | 2 ++ arch/x86/xen/p2m.c | 57 +- 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index a8a1a3d..b995b871 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1219,6 +1219,8 @@ static void __init xen_pagetable_init(void) paging_init(); #ifdef CONFIG_X86_64 xen_pagetable_p2m_copy(); +#else + xen_revector_p2m_tree(); #endif /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 2d8b908..fa53dc2 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -164,6 +164,7 @@ #include #include #include +#include #include #include @@ -204,6 +205,8 @@ RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER */ RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES); +static int use_brk = 1; + static inline unsigned p2m_top_index(unsigned long pfn) { BUG_ON(pfn >= MAX_P2M_PFN); @@ -268,6 +271,24 @@ static void p2m_init(unsigned long *p2m) p2m[i] = INVALID_P2M_ENTRY; } +static void * __ref alloc_p2m_page(void) +{ + if (unlikely(use_brk)) + return extend_brk(PAGE_SIZE, PAGE_SIZE); + + if (unlikely(!slab_is_available())) + return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + + return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT); +} + +/* Only to be called in case of a race for a page just allocated! */ +static void free_p2m_page(void *p) +{ + BUG_ON(!slab_is_available()); + free_page((unsigned long)p); +} + /* * Build the parallel p2m_top_mfn and p2m_mid_mfn structures * @@ -287,13 +308,13 @@ void __ref xen_build_mfn_list_list(void) /* Pre-initialize p2m_top_mfn to be completely missing */ if (p2m_top_mfn == NULL) { - p2m_mid_missing_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_mid_missing_mfn = alloc_p2m_page(); p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing); - p2m_top_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_top_mfn_p = alloc_p2m_page(); p2m_top_mfn_p_init(p2m_top_mfn_p); - p2m_top_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_top_mfn = alloc_p2m_page(); p2m_top_mfn_init(p2m_top_mfn); } else { /* Reinitialise, mfn's all change after migration */ @@ -327,7 +348,7 @@ void __ref xen_build_mfn_list_list(void) * missing parts of the mfn tree after * runtime. */ - mid_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + mid_mfn_p = alloc_p2m_page(); p2m_mid_mfn_init(mid_mfn_p, p2m_missing); p2m_top_mfn_p[topidx] = mid_mfn_p; @@ -364,17 +385,17 @@ void __init xen_build_dynamic_phys_to_machine(void) max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages); xen_max_p2m_pfn = max_pfn; - p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_missing = alloc_p2m_page(); p2m_init(p2m_missing); - p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_identity = alloc_p2m_page(); p2m_init(p2m_identity); - p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_mid_missing = alloc_p2m_page(); p2m_mid_init(p2m_mid_missing, p2m_missing); - p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_mid_identity = alloc_p2m_page(); p2m_mid_init(p2m_mid_identity, p2m_identity); - p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_top = alloc_p2m_page(); p2m_top_init(p2m_top); /* @@ -387,7 +408,7 @@ void __init xen_build_dynamic_phys_to_machine(void) unsigned mididx = p2m_mid_index(pfn); if (p2m_top[topidx] == p2m_mid_missing) { - unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE); + unsigned long **mid = alloc_p2m_page(); p2m_mid_init(mid, p2m_missing); p2m_top[topidx] = mid; @@ -420,6 +441,7 @@ unsigned long __init xen_revector_p2m_tree(void) unsigned long *mfn_list = N
[Xen-devel] [PATCH V4 05/10] xen: Delay m2p_override initialization
The m2p overrides are used to be able to find the local pfn for a foreign mfn mapped into the domain. They are used by driver backends having to access frontend data. As this functionality isn't used in early boot it makes no sense to initialize the m2p override functions very early. It can be done later without doing any harm, removing the need for allocating memory via extend_brk(). While at it make some m2p override functions static as they are only used internally. Signed-off-by: Juergen Gross Reviewed-by: David Vrabel Reviewed-by: Konrad Rzeszutek Wilk --- arch/x86/xen/p2m.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 24cd9d1..8676f35 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -428,8 +428,6 @@ void __init xen_build_dynamic_phys_to_machine(void) } p2m_top[topidx][mididx] = &mfn_list[pfn]; } - - m2p_override_init(); } #ifdef CONFIG_X86_64 unsigned long __init xen_revector_p2m_tree(void) @@ -500,13 +498,15 @@ unsigned long __init xen_revector_p2m_tree(void) } /* This should be the leafs allocated for identity from _brk. */ } - return (unsigned long)mfn_list; + m2p_override_init(); + return (unsigned long)mfn_list; } #else unsigned long __init xen_revector_p2m_tree(void) { use_brk = 0; + m2p_override_init(); return 0; } #endif @@ -796,15 +796,16 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) #define M2P_OVERRIDE_HASH_SHIFT10 #define M2P_OVERRIDE_HASH (1 << M2P_OVERRIDE_HASH_SHIFT) -static RESERVE_BRK_ARRAY(struct list_head, m2p_overrides, M2P_OVERRIDE_HASH); +static struct list_head *m2p_overrides; static DEFINE_SPINLOCK(m2p_override_lock); static void __init m2p_override_init(void) { unsigned i; - m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH, - sizeof(unsigned long)); + m2p_overrides = alloc_bootmem_align( + sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH, + sizeof(unsigned long)); for (i = 0; i < M2P_OVERRIDE_HASH; i++) INIT_LIST_HEAD(&m2p_overrides[i]); @@ -932,10 +933,14 @@ EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping); static struct page *m2p_find_override(unsigned long mfn) { unsigned long flags; - struct list_head *bucket = &m2p_overrides[mfn_hash(mfn)]; + struct list_head *bucket; struct page *p, *ret; + if (unlikely(!m2p_overrides)) + return NULL; + ret = NULL; + bucket = &m2p_overrides[mfn_hash(mfn)]; spin_lock_irqsave(&m2p_override_lock, flags); -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 2/9] xen: Make functions static
Some functions in arch/x86/xen/p2m.c are used locally only. Make them static. Rearrange the functions in p2m.c to avoid forward declarations. Signed-off-by: Juergen Gross --- arch/x86/include/asm/xen/page.h | 6 - arch/x86/xen/p2m.c | 346 2 files changed, 172 insertions(+), 180 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index c949923..6c16451 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -52,15 +52,9 @@ extern unsigned long set_phys_range_identity(unsigned long pfn_s, extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); -extern int m2p_add_override(unsigned long mfn, struct page *page, - struct gnttab_map_grant_ref *kmap_op); extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); -extern int m2p_remove_override(struct page *page, - struct gnttab_map_grant_ref *kmap_op, - unsigned long mfn); -extern struct page *m2p_find_override(unsigned long mfn); extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn); static inline unsigned long pfn_to_mfn(unsigned long pfn) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 73d354a..2d8b908 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -896,6 +896,61 @@ static unsigned long mfn_hash(unsigned long mfn) return hash_long(mfn, M2P_OVERRIDE_HASH_SHIFT); } +/* Add an MFN override for a particular page */ +static int m2p_add_override(unsigned long mfn, struct page *page, + struct gnttab_map_grant_ref *kmap_op) +{ + unsigned long flags; + unsigned long pfn; + unsigned long uninitialized_var(address); + unsigned level; + pte_t *ptep = NULL; + + pfn = page_to_pfn(page); + if (!PageHighMem(page)) { + address = (unsigned long)__va(pfn << PAGE_SHIFT); + ptep = lookup_address(address, &level); + if (WARN(ptep == NULL || level != PG_LEVEL_4K, +"m2p_add_override: pfn %lx not mapped", pfn)) + return -EINVAL; + } + + if (kmap_op != NULL) { + if (!PageHighMem(page)) { + struct multicall_space mcs = + xen_mc_entry(sizeof(*kmap_op)); + + MULTI_grant_table_op(mcs.mc, + GNTTABOP_map_grant_ref, kmap_op, 1); + + xen_mc_issue(PARAVIRT_LAZY_MMU); + } + } + spin_lock_irqsave(&m2p_override_lock, flags); + list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]); + spin_unlock_irqrestore(&m2p_override_lock, flags); + + /* p2m(m2p(mfn)) == mfn: the mfn is already present somewhere in +* this domain. Set the FOREIGN_FRAME_BIT in the p2m for the other +* pfn so that the following mfn_to_pfn(mfn) calls will return the +* pfn from the m2p_override (the backend pfn) instead. +* We need to do this because the pages shared by the frontend +* (xen-blkfront) can be already locked (lock_page, called by +* do_read_cache_page); when the userspace backend tries to use them +* with direct_IO, mfn_to_pfn returns the pfn of the frontend, so +* do_blockdev_direct_IO is going to try to lock the same pages +* again resulting in a deadlock. +* As a side effect get_user_pages_fast might not be safe on the +* frontend pages while they are being shared with the backend, +* because mfn_to_pfn (that ends up being called by GUPF) will +* return the backend pfn rather than the frontend pfn. */ + pfn = mfn_to_pfn_no_overrides(mfn); + if (get_phys_to_machine(pfn) == mfn) + set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)); + + return 0; +} + int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count) @@ -955,61 +1010,123 @@ out: } EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping); -/* Add an MFN override for a particular page */ -int m2p_add_override(unsigned long mfn, struct page *page, - struct gnttab_map_grant_ref *kmap_op) -{ - unsigned long flags; - unsigned long pfn; - unsigned long uninitialized_var(address); - unsigned level; - pte_t *ptep = NULL; - - pfn = page_to_pfn(page); - if (!PageHighMem(page)) { -
[Xen-devel] [PATCH V4 1/9] xen: fix some style issues in p2m.c
The source arch/x86/xen/p2m.c has some coding style issues. Fix them. Signed-off-by: Juergen Gross --- arch/x86/xen/p2m.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 9201a38..73d354a 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -922,7 +922,7 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops, continue; if (map_ops[i].flags & GNTMAP_contains_pte) { - pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) + + pte = (pte_t *)(mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) + (map_ops[i].host_addr & ~PAGE_MASK)); mfn = pte_mfn(*pte); } else { @@ -970,7 +970,7 @@ int m2p_add_override(unsigned long mfn, struct page *page, address = (unsigned long)__va(pfn << PAGE_SHIFT); ptep = lookup_address(address, &level); if (WARN(ptep == NULL || level != PG_LEVEL_4K, - "m2p_add_override: pfn %lx not mapped", pfn)) +"m2p_add_override: pfn %lx not mapped", pfn)) return -EINVAL; } @@ -1072,7 +1072,7 @@ int m2p_remove_override(struct page *page, ptep = lookup_address(address, &level); if (WARN(ptep == NULL || level != PG_LEVEL_4K, - "m2p_remove_override: pfn %lx not mapped", pfn)) +"m2p_remove_override: pfn %lx not mapped", pfn)) return -EINVAL; } @@ -1102,9 +1102,8 @@ int m2p_remove_override(struct page *page, * hypercall actually returned an error. */ if (kmap_op->handle == GNTST_general_error) { - printk(KERN_WARNING "m2p_remove_override: " - "pfn %lx mfn %lx, failed to modify kernel mappings", - pfn, mfn); + pr_warn("m2p_remove_override: pfn %lx mfn %lx, failed to modify kernel mappings", + pfn, mfn); put_balloon_scratch_page(); return -1; } @@ -1112,14 +,14 @@ int m2p_remove_override(struct page *page, xen_mc_batch(); mcs = __xen_mc_entry( - sizeof(struct gnttab_unmap_and_replace)); + sizeof(struct gnttab_unmap_and_replace)); unmap_op = mcs.args; unmap_op->host_addr = kmap_op->host_addr; unmap_op->new_addr = scratch_page_address; unmap_op->handle = kmap_op->handle; MULTI_grant_table_op(mcs.mc, - GNTTABOP_unmap_and_replace, unmap_op, 1); + GNTTABOP_unmap_and_replace, unmap_op, 1); mcs = __xen_mc_entry(0); MULTI_update_va_mapping(mcs.mc, scratch_page_address, -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 09/10] xen: switch to linear virtual mapped sparse p2m list
At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User:104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by: Juergen Gross --- arch/x86/include/asm/xen/page.h | 20 +- arch/x86/xen/mmu.c | 34 +- arch/x86/xen/p2m.c | 735 +--- arch/x86/xen/xen-ops.h | 2 +- 4 files changed, 347 insertions(+), 444 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 410a6ec..f5d5de4 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -65,13 +65,25 @@ extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn) * bits (identity or foreign) are set. * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set * identity or foreign indicator will be still set. __pfn_to_mfn() is - * encapsulating get_phys_to_machine(). - * - get_phys_to_machine() is to be called by __pfn_to_mfn() only to allow - * for future optimizations. + * encapsulating get_phys_to_machine() which is called in special cases only. + * - get_phys_to_machine() is to be called by __pfn_to_mfn() only in special + * cases needing an extended handling. */ static inline unsigned long __pfn_to_mfn(unsigned long pfn) { - return get_phys_to_machine(pfn); + unsigned long mfn; + + if (pfn < xen_p2m_size) + mfn = xen_p2m_addr[pfn]; + else if (unlikely(pfn < xen_max_p2m_pfn)) + return get_phys_to_machine(pfn); + else + return IDENTITY_FRAME(pfn); + + if (unlikely(mfn == INVALID_P2M_ENTRY)) + return get_phys_to_machine(pfn); + + return mfn; } static inline unsigned long pfn_to_mfn(unsigned long pfn) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3e3f8f8..6ab6150 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long vaddr, * instead of somewhere later and be confusing. */ xen_mc_flush(); } -static void __init xen_pagetable_p2m_copy(void) + +static void __init xen_pagetable_p2m_free(void) { unsigned long size; unsigned long addr; - unsigned long new_mfn_list; - - if (xen_feature(XENFEAT_auto_translated_physmap)) - return; size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long)); - new_mfn_list = xen_revector_p2m_tree(); /* No memory or already called. */ - if (!new_mfn_list || new_mfn_list == xen_start_info->mfn_list) + if ((unsigned long)xen_p2m_addr == xen_start_info->mfn_list) return; /* using __ka address and sticking INVALID_P2M_ENTRY! */ @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void) size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long)); memblock_free(__pa(xen_start_info->mfn_list), size); - /* And revector! Bye bye old array */ - xen_start_info->mfn_list = new_mfn_list; /* At this stage, cleanup_highmap has already cleaned __ka space * from _brk_limit way up to the max_pfn_mapped (which is the end of @@ -1214,14 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void) } #endif -static void __init xen_pagetable_init(void) +static void __init xen_pagetable_p2m_setup(void) { - paging_init(); + if (xen_feature(XENFEAT_auto_translated_physmap)) + return; + + xen_vmalloc_p2m_tree(); + #ifdef CONFIG_X86_64 - xen_pagetable_p2m_copy(); -#else - xen_revector_p2m_tree(); + xen_pagetable_p2m_free()
[Xen-devel] [PATCH V4 04/10] xen: Delay remapping memory of pv-domain
Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by: Juergen Gross Reviewed-by: David Vrabel --- arch/x86/include/asm/xen/page.h | 1 - arch/x86/xen/mmu.c | 4 + arch/x86/xen/p2m.c | 94 -- arch/x86/xen/setup.c| 369 ++-- arch/x86/xen/xen-ops.h | 1 + 5 files changed, 172 insertions(+), 297 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 6c16451..b475297 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -44,7 +44,6 @@ extern unsigned long machine_to_phys_nr; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); -extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern unsigned long set_phys_range_identity(unsigned long pfn_s, unsigned long pfn_e); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index b995b871..601914d 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1225,6 +1225,10 @@ static void __init xen_pagetable_init(void) /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); + /* Remap memory freed due to conflicts with E820 map */ + if (!xen_feature(XENFEAT_auto_translated_physmap)) + xen_remap_memory(); + xen_setup_shared_info(); xen_post_allocator_init(); } diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index fa53dc2..24cd9d1 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -662,100 +662,6 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn) return true; } -/* - * Skim over the P2M tree looking at pages that are either filled with - * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and - * replace the P2M leaf with a p2m_missing or p2m_identity. - * Stick the old page in the new P2M tree location. - */ -static bool __init early_can_reuse_p2m_middle(unsigned long set_pfn) -{ - unsigned topidx; - unsigned mididx; - unsigned ident_pfns; - unsigned inv_pfns; - unsigned long *p2m; - unsigned idx; - unsigned long pfn; - - /* We only look when this entails a P2M middle layer */ - if (p2m_index(set_pfn)) - return false; - - for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) { - topidx = p2m_top_index(pfn); - - if (!p2m_top[topidx]) - continue; - - if (p2m_top[topidx] == p2m_mid_missing) - continue; - - mididx = p2m_mid_index(pfn); - p2m = p2m_top[topidx][mididx]; - if (!p2m) - continue; - - if ((p2m == p2m_missing) || (p2m == p2m_identity)) - continue; - - if ((unsigned long)p2m == INVALID_P2M_ENTRY) - continue; - - ident_pfns = 0; - inv_pfns = 0; - for (idx = 0; idx < P2M_PER_PAGE; idx++) { - /* IDENTITY_PFNs are 1:1 */ - if (p2m[idx] == IDENTITY_FRAME(pfn + idx)) - ident_pfns++; - else if (p2m[idx] == INVALID_P2M_ENTRY) - inv_pfns++; - else - break; - } - if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE)) - goto found; - } - return false; -found: - /* Found one, replace old with p2m_identity or p2m_missing */ - p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing); - - /* Reset where we want to stick the old page in. */ - topidx = p2m_top_index(set_pfn); - mididx = p2m_mid_index(set_pfn); - - /* This shouldn't happen */ - if (WARN_ON(p2m_top[topidx] == p2m_mid_missing)) - early_alloc_p2m_middle(set_pfn); -
[Xen-devel] [PATCH V4 07/10] x86: Introduce function to get pmd entry pointer
Introduces lookup_pmd_address() to get the address of the pmd entry related to a virtual address in the current address space. This function is needed for support of a virtual mapped sparse p2m list in xen pv domains, as we need the address of the pmd entry, not the one of the pte in that case. Signed-off-by: Juergen Gross --- arch/x86/include/asm/pgtable_types.h | 1 + arch/x86/mm/pageattr.c | 20 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 0778964..d83f5e7 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -396,6 +396,7 @@ static inline void update_page_count(int level, unsigned long pages) { } extern pte_t *lookup_address(unsigned long address, unsigned int *level); extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, unsigned int *level); +extern pmd_t *lookup_pmd_address(unsigned long address); extern phys_addr_t slow_virt_to_phys(void *__address); extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, unsigned numpages, unsigned long page_flags); diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 36de293..1298108 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -384,6 +384,26 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address, } /* + * Lookup the PMD entry for a virtual address. Return a pointer to the entry + * or NULL if not present. + */ +pmd_t *lookup_pmd_address(unsigned long address) +{ + pgd_t *pgd; + pud_t *pud; + + pgd = pgd_offset_k(address); + if (pgd_none(*pgd)) + return NULL; + + pud = pud_offset(pgd, address); + if (pud_none(*pud) || pud_large(*pud) || !pud_present(*pud)) + return NULL; + + return pmd_offset(pud, address); +} + +/* * This is necessary because __pa() does not work on some * kinds of memory, like vmalloc() or the alloc_remap() * areas on 32-bit NUMA systems. The percpu areas can -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 06/10] xen: Delay invalidating extra memory
When the physical memory configuration is initialized the p2m entries for not pouplated memory pages are set to "invalid". As those pages are beyond the hypervisor built p2m list the p2m tree has to be extended. This patch delays processing the extra memory related p2m entries during the boot process until some more basic memory management functions are callable. This removes the need to create new p2m entries until virtual memory management is available. Signed-off-by: Juergen Gross Reviewed-by: David Vrabel --- arch/x86/include/asm/xen/page.h | 3 + arch/x86/xen/p2m.c | 128 arch/x86/xen/setup.c| 98 ++ arch/x86/xen/xen-ops.h | 3 +- 4 files changed, 103 insertions(+), 129 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index b475297..28fa795 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -41,6 +41,9 @@ typedef struct xpaddr { extern unsigned long *machine_to_phys_mapping; extern unsigned long machine_to_phys_nr; +extern unsigned long *xen_p2m_addr; +extern unsigned long xen_p2m_size; +extern unsigned long xen_max_p2m_pfn; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 8676f35..eddec40 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -181,7 +181,12 @@ static void __init m2p_override_init(void); +unsigned long *xen_p2m_addr __read_mostly; +EXPORT_SYMBOL_GPL(xen_p2m_addr); +unsigned long xen_p2m_size __read_mostly; +EXPORT_SYMBOL_GPL(xen_p2m_size); unsigned long xen_max_p2m_pfn __read_mostly; +EXPORT_SYMBOL_GPL(xen_max_p2m_pfn); static unsigned long *p2m_mid_missing_mfn; static unsigned long *p2m_top_mfn; @@ -198,13 +203,6 @@ static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE); RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE))); -/* For each I/O range remapped we may lose up to two leaf pages for the boundary - * violations and three mid pages to cover up to 3GB. With - * early_can_reuse_p2m_middle() most of the leaf pages will be reused by the - * remapped region. - */ -RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES); - static int use_brk = 1; static inline unsigned p2m_top_index(unsigned long pfn) @@ -381,9 +379,11 @@ void __init xen_build_dynamic_phys_to_machine(void) if (xen_feature(XENFEAT_auto_translated_physmap)) return; + xen_p2m_addr = (unsigned long *)xen_start_info->mfn_list; mfn_list = (unsigned long *)xen_start_info->mfn_list; max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages); xen_max_p2m_pfn = max_pfn; + xen_p2m_size = max_pfn; p2m_missing = alloc_p2m_page(); p2m_init(p2m_missing); @@ -499,6 +499,11 @@ unsigned long __init xen_revector_p2m_tree(void) /* This should be the leafs allocated for identity from _brk. */ } + xen_p2m_size = xen_max_p2m_pfn; + xen_p2m_addr = mfn_list; + + xen_inv_extra_mem(); + m2p_override_init(); return (unsigned long)mfn_list; } @@ -506,6 +511,8 @@ unsigned long __init xen_revector_p2m_tree(void) unsigned long __init xen_revector_p2m_tree(void) { use_brk = 0; + xen_p2m_size = xen_max_p2m_pfn; + xen_inv_extra_mem(); m2p_override_init(); return 0; } @@ -514,8 +521,12 @@ unsigned long get_phys_to_machine(unsigned long pfn) { unsigned topidx, mididx, idx; - if (unlikely(pfn >= MAX_P2M_PFN)) + if (unlikely(pfn >= xen_p2m_size)) { + if (pfn < xen_max_p2m_pfn) + return xen_chk_extra_mem(pfn); + return IDENTITY_FRAME(pfn); + } topidx = p2m_top_index(pfn); mididx = p2m_mid_index(pfn); @@ -613,78 +624,12 @@ static bool alloc_p2m(unsigned long pfn) return true; } -static bool __init early_alloc_p2m(unsigned long pfn, bool check_boundary) -{ - unsigned topidx, mididx, idx; - unsigned long *p2m; - - topidx = p2m_top_index(pfn); - mididx = p2m_mid_index(pfn); - idx = p2m_index(pfn); - - /* Pfff.. No boundary cross-over, lets get out. */ - if (!idx && check_boundary) - return false; - - WARN(p2m_top[topidx][mididx] == p2m_identity, - "P2M[%d][%d] == IDENTITY, should be MISSING (or alloced)!\n", - topidx, mididx); - - /* -* Could be done by xen_build_dynamic_phys_to_machine.. -*/ - if (p2m_top[topidx][mididx] != p2m_missing) - return false; - - /* Boundary cross-over for the edges: */ - p2m = alloc_p2m_page(); - - p2m_init(p2m); - - p2m_top[topidx][mididx] = p2m; - -
[Xen-devel] [PATCH V4 10/10] xen: Speed up set_phys_to_machine() by using read-only mappings
Instead of checking at each call of set_phys_to_machine() whether a new p2m page has to be allocated due to writing an entry in a large invalid or identity area, just map those areas read only and react to a page fault on write by allocating the new page. This change will make the common path with no allocation much faster as it only requires a single write of the new mfn instead of walking the address translation tables and checking for the special cases. Suggested-by: David Vrabel Signed-off-by: Juergen Gross Reviewed-by: David Vrabel Reviewed-by: Konrad Rzeszutek Wilk --- arch/x86/xen/p2m.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 7d84473..8b5db51 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -70,6 +70,7 @@ #include #include +#include #include #include @@ -316,9 +317,9 @@ static void __init xen_rebuild_p2m_list(unsigned long *p2m) paravirt_alloc_pte(&init_mm, __pa(p2m_identity_pte) >> PAGE_SHIFT); for (i = 0; i < PTRS_PER_PTE; i++) { set_pte(p2m_missing_pte + i, - pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL)); + pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL_RO)); set_pte(p2m_identity_pte + i, - pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL)); + pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL_RO)); } for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += chunk) { @@ -365,7 +366,7 @@ static void __init xen_rebuild_p2m_list(unsigned long *p2m) p2m_missing : p2m_identity; ptep = populate_extra_pte((unsigned long)(p2m + pfn)); set_pte(ptep, - pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL)); + pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL_RO)); continue; } @@ -624,6 +625,9 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) return true; } + if (likely(!__put_user(mfn, xen_p2m_addr + pfn))) + return true; + ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level); BUG_ON(!ptep || level != PG_LEVEL_4K); @@ -633,9 +637,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity))) return mfn == IDENTITY_FRAME(pfn); - xen_p2m_addr[pfn] = mfn; - - return true; + return false; } bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest
If hardware support the 1GB pages, expose the feature to guest by default. Users don't have to use a 'cpuid= ' option in config fil e to turn it on. If guest use shadow mode, the 1GB pages feature will be hidden from guest, this is done in the function hvm_cpuid(). So the change is okay for shadow mode case. Signed-off-by: Liang Li Signed-off-by: Yang Zhang --- tools/libxc/xc_cpuid_x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index a18b1ff..c97f91a 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -109,6 +109,7 @@ static void amd_xc_cpuid_policy( regs[3] &= (0x0183f3ff | /* features shared with 0x0001:EDX */ bitmaskof(X86_FEATURE_NX) | bitmaskof(X86_FEATURE_LM) | +bitmaskof(X86_FEATURE_PAGE1GB) | bitmaskof(X86_FEATURE_SYSCALL) | bitmaskof(X86_FEATURE_MP) | bitmaskof(X86_FEATURE_MMXEXT) | @@ -192,6 +193,7 @@ static void intel_xc_cpuid_policy( bitmaskof(X86_FEATURE_ABM)); regs[3] &= (bitmaskof(X86_FEATURE_NX) | bitmaskof(X86_FEATURE_LM) | +bitmaskof(X86_FEATURE_PAGE1GB) | bitmaskof(X86_FEATURE_SYSCALL) | bitmaskof(X86_FEATURE_RDTSCP)); break; @@ -386,6 +388,7 @@ static void xc_cpuid_hvm_policy( clear_bit(X86_FEATURE_LM, regs[3]); clear_bit(X86_FEATURE_NX, regs[3]); clear_bit(X86_FEATURE_PSE36, regs[3]); +clear_bit(X86_FEATURE_PAGE1GB, regs[3]); } break; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] flask/policy: Updates for example XSM policy
On Wed, 2014-09-24 at 10:14 +0100, Ian Campbell wrote: > On Tue, 2014-09-23 at 16:40 -0400, Daniel De Graaf wrote: > > On 09/23/2014 11:30 AM, Ian Campbell wrote: > > > On Tue, 2014-09-23 at 10:37 +0100, Wei Liu wrote: > > >> On Tue, Sep 23, 2014 at 10:01:48AM +0100, Wei Liu wrote: > > >>> On Mon, Sep 22, 2014 at 04:23:18PM -0400, Daniel De Graaf wrote: > > The example XSM policy was missing permission for dom0_t to migrate > > domains with label domU_t; add these permissions. > > > > > > Daniel, would you prefer to iterate until a full batch of fixes or shall > > > I apply and expect "More updates for example XSM policy" later on? > > > > > > > I would prefer to iterate and apply the full set. > > Ack. I've just spotted this in my queue, did this full set ever happen? I don't see it in tree of in my queue folder. Maybe the issue went away some other way? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] pygrub: Fix regression from c/s d1b93ea, attempt 2
On Tue, 2014-11-25 at 11:11 -0500, Boris Ostrovsky wrote: > c/s d1b93ea causes substantial functional regressions in pygrub's ability to > parse bootloader configuration files. > > c/s d1b93ea itself changed an an interface which previously used exclusively > integers, to using strings in the case of a grub configuration with explicit > default set, along with changing the code calling the interface to require a > string. The default value for "default" remained as an integer. > > As a result, any Extlinux or Lilo configuration (which drives this interface > exclusively with integers), or Grub configuration which doesn't explicitly > declare a default will die with an AttributeError when attempting to call > "self.cf.default.isdigit()" where "default" is an integer. > > Sadly, this AttributeError gets swallowed by the blanket ignore in the loop > which searches partitions for valid bootloader configurations, causing the > issue to be reported as "Unable to find partition containing kernel" > > We should explicitly check type of "default" in image_index() and process it > appropriately. > > Reported-by: Andrew Cooper > Signed-off-by: Boris Ostrovsky Acked-by: Ian Campbell In general I would prefer the first line of the commit message to be a short description of the actual functional change and not a reference to fixing some other commit, which is basically meaningless to anyone reading the log even now, nevermind in six months. I think I'm going to start picking up on this more frequently from now on. CCing Konrad for RM input. I'm much less worried about corner cases etc wrt the freeze etc than I was with the larger fix proposed earlier. > --- > > Commit message is Andrew's with exception of the last sentense. > > I only tested this with grub2. > > -boris > > tools/pygrub/src/pygrub |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > mode change 100644 => 100755 tools/pygrub/src/pygrub > > diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub > old mode 100644 > new mode 100755 > index aa7e562..3ec52fd > --- a/tools/pygrub/src/pygrub > +++ b/tools/pygrub/src/pygrub > @@ -457,7 +457,9 @@ class Grub: > self.cf.parse(buf) > > def image_index(self): > -if self.cf.default.isdigit(): > +if isinstance(self.cf.default, int): > +sel = self.cf.default > +elif self.cf.default.isdigit(): > sel = int(self.cf.default) > else: > # We don't fully support submenus. Look for the leaf value in ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest
>>> On 28.11.14 at 11:52, wrote: > If hardware support the 1GB pages, expose the feature to guest by > default. Users don't have to use a 'cpuid= ' option in config fil > e to turn it on. > > If guest use shadow mode, the 1GB pages feature will be hidden from > guest, this is done in the function hvm_cpuid(). So the change is > okay for shadow mode case. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 3/3] python/xs: Correct the indirection of the NULL xshandle() check
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: > The code now now matches its comment, and will actually catch the case of a > bad xs handle. > > Signed-off-by: Andrew Cooper > Coverity-ID: 1055948 > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > CC: Xen Coverity Team Acked-by: Ian Campbell > --- > tools/python/xen/lowlevel/xs/xs.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/python/xen/lowlevel/xs/xs.c > b/tools/python/xen/lowlevel/xs/xs.c > index 84e1711..ec364bb 100644 > --- a/tools/python/xen/lowlevel/xs/xs.c > +++ b/tools/python/xen/lowlevel/xs/xs.c > @@ -816,7 +816,7 @@ static int parse_transaction_path(XsHandle *self, > PyObject *args, > > *xh = xshandle(self); > > -if (!xh) > +if (!*xh) > return 0; > > if (!PyArg_ParseTuple(args, "ss", &thstr, path)) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Xen-4.5 HVMOP ABI issues
Hi, I have finally worked out why XenServer is having issues with its legacy windows drivers and Xen-4.5. XenServer needs to support hvm ops with an indicies of 0x102 and 0x103 to prevent our legacy windows drivers from BSODing on boot. (These problems will disappear when we can drop Windows XP/Server 2k3 support, but that time is not now.) The root cause of the breakage is c/s e8b87b57b "x86/HVM: fix preemption handling in do_hvm_op() (try 2)", and in particular the HVMOP_op_mask, which turns the above hypercalls into HVMOP_set_{pci_intx,isa_irq}_level hypercalls. >From my point of view, I can work around this with the knowledge that HVMOP_set_{pci_intx,isa_irq}_level hypercalls are not eligible for pre-emption. However, it occurs to me that HVMOP_op_mask absolutely must live in the public header files, as it is guest visible, and forms part of the ABI. Consider a userspace task which falls into a continuation, is preempted and paused by the guest kernel, the entire VM migrated, and the task unpaused later. If HVMOP_op_mask has changed in that time, the continuation will become erroneous. In particular, all hypercall continuation strategies we use *must* be forward compatible when moving to newer versions of Xen, because Xen cannot possibly guarantee that a continuation which starts will finish on the same hypervisor. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xenstore.h clarifications
Ian Campbell writes ("Re: [Xen-devel] Xenstore.h clarifications"): > I think there's a pretty good chance that the same applies to xenstore > connections made over the device/shared-ring interface. Yes. > I'm not really sure about the semantics of a Unix domain socket after a > fork, but I don't expect both parent and child could sanely make use of > it. >From the kernel's point of view everything is fine but of course the protocol running through the socket would be quite messed up. For xenstore, while in theory you could take turns somehow, I don't think suggesting that is sensible. > So I think the answer is: > > * Connections made with xs_open(0) (which might be shared page or > socket based) are only guaranteed to work in the parent after > fork. > * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable > in either the parent or the child after fork, but not both. > * xs_daemon_open*() and xs_domain_open() are deprecated synonyms > for xs_open(0) > * XS_OPEN_READONLY has not bearing on any of this. > > Ian, does that seem right? Exactly, yes. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: > The error handling from a failed memory allocation should return > PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and continuing > to the memcpy() below, with the dest pointer being NULL. > > Furthermore, the context string is simply an input parameter to the hypercall, > and is not mutated anywhere along the way. The error handling elsewhere in > the function can be simplified by not duplicating it to start with. > > Signed-off-by: Andrew Cooper > Coverity-IDs: 1055305 1055721 > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > CC: Xen Coverity Team Acked-by: Ian Campbell This would have been far more obviously correct for 4.5 if you had stuck to fixing the issue in the first paragraph. > --- > tools/python/xen/lowlevel/xc/xc.c | 21 +++-- > 1 file changed, 3 insertions(+), 18 deletions(-) > > diff --git a/tools/python/xen/lowlevel/xc/xc.c > b/tools/python/xen/lowlevel/xc/xc.c > index d95d459..c70b388 100644 > --- a/tools/python/xen/lowlevel/xc/xc.c > +++ b/tools/python/xen/lowlevel/xc/xc.c > @@ -2126,8 +2126,6 @@ static PyObject *pyflask_context_to_sid(PyObject *self, > PyObject *args, > { > xc_interface *xc_handle; > char *ctx; > -char *buf; > -uint32_t len; > uint32_t sid; > int ret; > > @@ -2137,28 +2135,15 @@ static PyObject *pyflask_context_to_sid(PyObject > *self, PyObject *args, >&ctx) ) > return NULL; > > -len = strlen(ctx); > - > -buf = malloc(len); > -if (!buf) { > -errno = -ENOMEM; > -PyErr_SetFromErrno(xc_error_obj); > -} > - > -memcpy(buf, ctx, len); > - > xc_handle = xc_interface_open(0,0,0); > if (!xc_handle) { > -free(buf); > return PyErr_SetFromErrno(xc_error_obj); > } > - > -ret = xc_flask_context_to_sid(xc_handle, buf, len, &sid); > - > + > +ret = xc_flask_context_to_sid(xc_handle, ctx, strlen(ctx), &sid); > + > xc_interface_close(xc_handle); > > -free(buf); > - > if ( ret != 0 ) { > errno = -ret; > return PyErr_SetFromErrno(xc_error_obj); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 2/3] python/xc: Fix multiple issues in pyxc_readconsolering()
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: > Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first > xc_readconsolering() fail. It is trivial to run throught the processes memory > by repeatedly passing junk parameters to this function. > > In the case that the call to xc_readconsolering() in the while loop fails, > reinstate str before breaking out, and passing a spurious pointer to free(). > > Signed-off-by: Andrew Cooper > Coverity-IDs: 1054984 1055906 > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > CC: Xen Coverity Team Acked-by: Ian Campbell > --- > tools/python/xen/lowlevel/xc/xc.c | 13 ++--- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/tools/python/xen/lowlevel/xc/xc.c > b/tools/python/xen/lowlevel/xc/xc.c > index c70b388..2aa0dc7 100644 > --- a/tools/python/xen/lowlevel/xc/xc.c > +++ b/tools/python/xen/lowlevel/xc/xc.c > @@ -1089,7 +1089,7 @@ static PyObject *pyxc_readconsolering(XcObject *self, > { > unsigned int clear = 0, index = 0, incremental = 0; > unsigned int count = 16384 + 1, size = count; > -char*str = malloc(size), *ptr; > +char*str, *ptr; > PyObject*obj; > int ret; > > @@ -1097,15 +1097,17 @@ static PyObject *pyxc_readconsolering(XcObject *self, > > if ( !PyArg_ParseTupleAndKeywords(args, kwds, "|iii", kwd_list, >&clear, &index, &incremental) || > - !str ) > + !(str = malloc(size)) ) > return NULL; > > ret = xc_readconsolering(self->xc_handle, str, &count, clear, > incremental, &index); > -if ( ret < 0 ) > +if ( ret < 0 ) { > +free(str); > return pyxc_error_to_exception(self->xc_handle); > +} > > -while ( !incremental && count == size ) > +while ( !incremental && count == size && ret >= 0 ) > { > size += count - 1; > if ( size < count ) > @@ -1119,9 +1121,6 @@ static PyObject *pyxc_readconsolering(XcObject *self, > count = size - count; > ret = xc_readconsolering(self->xc_handle, str, &count, clear, > 1, &index); > -if ( ret < 0 ) > -break; > - > count += str - ptr; > str = ptr; > } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()
On 28/11/14 11:37, Ian Campbell wrote: > On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: >> The error handling from a failed memory allocation should return >> PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and >> continuing >> to the memcpy() below, with the dest pointer being NULL. >> >> Furthermore, the context string is simply an input parameter to the >> hypercall, >> and is not mutated anywhere along the way. The error handling elsewhere in >> the function can be simplified by not duplicating it to start with. >> >> Signed-off-by: Andrew Cooper >> Coverity-IDs: 1055305 1055721 >> CC: Ian Campbell >> CC: Ian Jackson >> CC: Wei Liu >> CC: Xen Coverity Team > Acked-by: Ian Campbell > > This would have been far more obviously correct for 4.5 if you had stuck > to fixing the issue in the first paragraph. Hmm - I appear to have missed a detail in the description. One of the CIDs is to do with putting a string in a new buffer without a NUL terminator, and passing it as a char* into xc_flask_context_to_sid; the issue being that anyone expecting things like strlen() to work will be in for a nasty shock. One solution to this was strdup(), but as the duplication was unnecessary anyway, it was easier just to drop it all. ~Andrew > >> --- >> tools/python/xen/lowlevel/xc/xc.c | 21 +++-- >> 1 file changed, 3 insertions(+), 18 deletions(-) >> >> diff --git a/tools/python/xen/lowlevel/xc/xc.c >> b/tools/python/xen/lowlevel/xc/xc.c >> index d95d459..c70b388 100644 >> --- a/tools/python/xen/lowlevel/xc/xc.c >> +++ b/tools/python/xen/lowlevel/xc/xc.c >> @@ -2126,8 +2126,6 @@ static PyObject *pyflask_context_to_sid(PyObject >> *self, PyObject *args, >> { >> xc_interface *xc_handle; >> char *ctx; >> -char *buf; >> -uint32_t len; >> uint32_t sid; >> int ret; >> >> @@ -2137,28 +2135,15 @@ static PyObject *pyflask_context_to_sid(PyObject >> *self, PyObject *args, >>&ctx) ) >> return NULL; >> >> -len = strlen(ctx); >> - >> -buf = malloc(len); >> -if (!buf) { >> -errno = -ENOMEM; >> -PyErr_SetFromErrno(xc_error_obj); >> -} >> - >> -memcpy(buf, ctx, len); >> - >> xc_handle = xc_interface_open(0,0,0); >> if (!xc_handle) { >> -free(buf); >> return PyErr_SetFromErrno(xc_error_obj); >> } >> - >> -ret = xc_flask_context_to_sid(xc_handle, buf, len, &sid); >> - >> + >> +ret = xc_flask_context_to_sid(xc_handle, ctx, strlen(ctx), &sid); >> + >> xc_interface_close(xc_handle); >> >> -free(buf); >> - >> if ( ret != 0 ) { >> errno = -ret; >> return PyErr_SetFromErrno(xc_error_obj); > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: > state at the GIC level. This would also avoid masking the output signal > and requires specific handling in the guest OS. "which requires"? It doesn't seem quite right to me otherwise, since context switching the virq state *removes* the need to have the guest do anything other than what it would do on native. Assuming this is what you meant I propose (fixing some grammar etc as I go): xen/arm: Handle platforms with edge-triggered virtual timer Some platforms (such as the ARMv8 model) use an edge-triggered interrupt for the virtual timer. Even if the timer output signal is masked in the context switch, the GIC will keep track that of any interrupts raised while IRQs are disabled. As soon as IRQs are re-enabled, the virtual interrupt timer will be injected to Xen. If an idle vVCPU was scheduled next then the interrupt handler doesn't expect to the receive the IRQ and will crash: (XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54 (XEN)[<00247010>] do_IRQ+0x1a4/0x220 (XEN)[<00244864>] gic_interrupt+0x50/0xec (XEN)[<0024fbac>] do_trap_irq+0x20/0x2c (XEN)[<00255240>] hyp_irq+0x5c/0x60 (XEN)[<00241084>] context_switch+0xb8/0xc4 (XEN)[<0022482c>] schedule+0x684/0x6d0 (XEN)[<0022785c>] __do_softirq+0xcc/0xe8 (XEN)[<002278d4>] do_softirq+0x14/0x1c (XEN)[<00240fac>] idle_loop+0x134/0x154 (XEN)[<0024c160>] start_secondary+0x14c/0x15c (XEN)[<0001>] 0001 The proper solution is to context switch the virtual interrupt state at the GIC level. This would also avoid masking the output signal which requires specific handling in the guest OS and more complex code in Xen to deal with EOIs, and so is desirable for that reason too. Sadly, this solution requires some refactoring which would not be suitable for a freeze exception for the Xen 4.5 release. For now implement a temporary solution which ignores the virtual timer interrupt when the idle VCPU is running. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug
On Fri, 2014-11-28 at 08:32 +, M A Young wrote: > On Fri, 28 Nov 2014, Ian Campbell wrote: > > > On Fri, 2014-11-28 at 00:28 +, M A Young wrote: > >> Migrations with xl migrate --debug will fail because debugging information > >> from the receiving process is written to the stdout channel. This channel > >> is also used for status messages so the migration will fail as the sending > >> process receives an unexpected message. This patch moves the debugging > >> information to the stderr channel. > >> > >> Version 2 moves some output back to stdout that was accidentally moved > >> to stderr in the first version. > >> > >> Signed-off-by: Michael Young > > > > I think you've forgotten the attachment. > > Yes, I did. Here it is. Acked-by: Ian Campbell It's pretty big but a large chunk is pretty mechanical. Were you intending this for 4.5? (Konrad CCd). > Use stderr for debugging messages for xl migrate --debug as stdout is used > for status messages. > > --- xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c.orig 2014-10-24 15:22:40.0 > +0100 > +++ xen-4.5.0-rc1/tools/libxl/xl_cmdimpl.c 2014-11-26 22:41:41.697043321 > + > @@ -380,10 +380,10 @@ > } > static void printf_info(enum output_format output_format, > int domid, > -libxl_domain_config *d_config) > +libxl_domain_config *d_config, FILE *fh) > { > if (output_format == OUTPUT_FORMAT_SXP) > -return printf_info_sexp(domid, d_config); > +return printf_info_sexp(domid, d_config, fh); > > const char *buf; > libxl_yajl_length len = 0; > @@ -404,7 +404,7 @@ > if (s != yajl_gen_status_ok) > goto out; > > -puts(buf); > +fputs(buf, fh); > > out: > yajl_gen_free(hand); > @@ -413,7 +413,13 @@ > fprintf(stderr, > "unable to format domain config as JSON (YAJL:%d)\n", s); > > -if (ferror(stdout) || fflush(stdout)) { perror("stdout"); exit(-1); } > +if (ferror(fh) || fflush(fh)) { > +if (fh == stdout) > +perror("stdout"); > +else > +perror("stderr"); > +exit(-1); > +} > } > > static int do_daemonize(char *name) > @@ -2423,7 +2429,7 @@ > } > > if (!dom_info->quiet) > -printf("Parsing config from %s\n", config_source); > +fprintf(stderr, "Parsing config from %s\n", config_source); > > if (config_in_json) { > libxl_domain_config_from_json(ctx, &d_config, > @@ -2451,7 +2457,7 @@ > } > > if (debug || dom_info->dryrun) > -printf_info(default_output_format, -1, &d_config); > +printf_info(default_output_format, -1, &d_config, stderr); > > ret = 0; > if (dom_info->dryrun) > @@ -3403,7 +3409,7 @@ > if (default_output_format == OUTPUT_FORMAT_JSON) > s = printf_info_one_json(hand, info[i].domid, &d_config); > else > -printf_info_sexp(info[i].domid, &d_config); > +printf_info_sexp(info[i].domid, &d_config, stdout); > libxl_domain_config_dispose(&d_config); > if (s != yajl_gen_status_ok) > goto out; > @@ -4725,7 +4731,7 @@ > parse_config_data(filename, config_data, config_len, &d_config); > > if (debug || dryrun_only) > -printf_info(default_output_format, -1, &d_config); > +printf_info(default_output_format, -1, &d_config, stdout); > > if (!dryrun_only) { > fprintf(stderr, "setting dom%d configuration\n", domid); > --- xen-4.5.0-rc1/tools/libxl/xl_sxp.c.orig 2014-10-24 15:22:40.0 > +0100 > +++ xen-4.5.0-rc1/tools/libxl/xl_sxp.c 2014-11-26 22:30:58.416394082 + > @@ -30,7 +30,7 @@ > /* In general you should not add new output to this function since it > * is intended only for legacy use. > */ > -void printf_info_sexp(int domid, libxl_domain_config *d_config) > +void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh) > { > int i; > libxl_dominfo info; > @@ -38,195 +38,195 @@ > libxl_domain_create_info *c_info = &d_config->c_info; > libxl_domain_build_info *b_info = &d_config->b_info; > > -printf("(domain\n\t(domid %d)\n", domid); > -printf("\t(create_info)\n"); > -printf("\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM); > -printf("\t(hap %s)\n", libxl_defbool_to_string(c_info->hap)); > -printf("\t(oos %s)\n", libxl_defbool_to_string(c_info->oos)); > -printf("\t(ssidref %d)\n", c_info->ssidref); > -printf("\t(name %s)\n", c_info->name); > +fprintf(fh, "(domain\n\t(domid %d)\n", domid); > +fprintf(fh, "\t(create_info)\n"); > +fprintf(fh, "\t(hvm %d)\n", c_info->type == LIBXL_DOMAIN_TYPE_HVM); > +fprintf(fh, "\t(hap %s)\n", libxl_defbool_to_string(c_info->hap)); > +fprintf(fh, "\t(oos %s)\n", libxl_defbool_to_string(c_info->oos)); > +fprintf(fh, "\t(ssidref %d)\n", c_info->ssidref); > +fprintf(fh, "\t(name %s
Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest
On Fri, 2014-11-28 at 18:52 +0800, Liang Li wrote: > If hardware support the 1GB pages, expose the feature to guest by > default. Users don't have to use a 'cpuid= ' option in config fil > e to turn it on. > > If guest use shadow mode, the 1GB pages feature will be hidden from > guest, this is done in the function hvm_cpuid(). So the change is > okay for shadow mode case. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang FTR although this is strictly speaking a toolstack patch I think the main ack required should be from the x86 hypervisor guys... > --- > tools/libxc/xc_cpuid_x86.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c > index a18b1ff..c97f91a 100644 > --- a/tools/libxc/xc_cpuid_x86.c > +++ b/tools/libxc/xc_cpuid_x86.c > @@ -109,6 +109,7 @@ static void amd_xc_cpuid_policy( > regs[3] &= (0x0183f3ff | /* features shared with 0x0001:EDX */ > bitmaskof(X86_FEATURE_NX) | > bitmaskof(X86_FEATURE_LM) | > +bitmaskof(X86_FEATURE_PAGE1GB) | > bitmaskof(X86_FEATURE_SYSCALL) | > bitmaskof(X86_FEATURE_MP) | > bitmaskof(X86_FEATURE_MMXEXT) | > @@ -192,6 +193,7 @@ static void intel_xc_cpuid_policy( > bitmaskof(X86_FEATURE_ABM)); > regs[3] &= (bitmaskof(X86_FEATURE_NX) | > bitmaskof(X86_FEATURE_LM) | > +bitmaskof(X86_FEATURE_PAGE1GB) | > bitmaskof(X86_FEATURE_SYSCALL) | > bitmaskof(X86_FEATURE_RDTSCP)); > break; > @@ -386,6 +388,7 @@ static void xc_cpuid_hvm_policy( > clear_bit(X86_FEATURE_LM, regs[3]); > clear_bit(X86_FEATURE_NX, regs[3]); > clear_bit(X86_FEATURE_PSE36, regs[3]); > +clear_bit(X86_FEATURE_PAGE1GB, regs[3]); > } > break; > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass
Jan Beulich writes ("Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass"): > On 28.11.14 at 10:07, wrote: > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. > > 31781 > > Wasn't the swiotlb problem supposed to be dealt with by now? > swiotlb=65536 looks to be in place here, yet that still appears to > not be big enough... That was just an attempted workaround, which we knew might or might not work. David Vrabel posted some patches to lmkl which are supposed to properly fix it. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] fix migration failure with xl migrate --debug
Ian Campbell writes ("Re: [PATCH v2] fix migration failure with xl migrate --debug"): > Acked-by: Ian Campbell Thanks for reviewing it :-). > It's pretty big but a large chunk is pretty mechanical. Were you > intending this for 4.5? (Konrad CCd). I think (based on reading the thread but not the code) that this ought to be in 4.5. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] missing chunk of HVM direct kernel boot patch
On Fri, 2014-11-28 at 13:55 +0800, Chunyan Liu wrote: > Found by Stefano, this chunk of the patch was never applied to > xen-unstable (commit 11dffa2359e8a2629490c14c029c7c7c777b3e47), > see http://marc.info/?l=qemu-devel&m=140471493425353&w=2. How strange, "git am" usually makes it pretty difficult to miss hunks. Sorry about this. > Signed-off-by: Chunyan Liu Acked-by: Ian Campbell I'm afraid that despite the circumstances this still needs a release ack from Konrad. Obviously the upside is fixing a partially implemented feature, but I'm not sure what the downsides are. Has this been tested with stubdoms, including when this feature is not used? My biggest concern is that because this function is also used to build the command line for the stubdom and the stubdom is PV and hence has at least a ->kernel setting then this new code might break that use case, by adding these options when they are not wanted. This path is all a bit tangled so I'm not 100% sure if those fields are actually set or not. > --- > tools/libxl/libxl_dm.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 3e191c3..b25b574 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -527,6 +527,15 @@ static char ** > libxl__build_device_model_args_new(libxl__gc *gc, > if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) { > int ioemu_nics = 0; > > +if (b_info->kernel) > +flexarray_vappend(dm_args, "-kernel", b_info->kernel, NULL); > + > +if (b_info->ramdisk) > +flexarray_vappend(dm_args, "-initrd", b_info->ramdisk, NULL); > + > +if (b_info->cmdline) > +flexarray_vappend(dm_args, "-append", b_info->cmdline, NULL); > + > if (b_info->u.hvm.serial || b_info->u.hvm.serial_list) { > if ( b_info->u.hvm.serial && b_info->u.hvm.serial_list ) > { ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 0/3] Coverity fixes for python lowlevel libraries
On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: > While Xend is certainly dead and gone, XenServer at the very least still has > consumers of these python libraries. AIUI XS mainly uses the xs.c stuff, with the xc.c stuff being mainly debug utilities. This is important because while the xs.c is simple and obviously correct the xc.c patches are a little more involved. > > Konrad: I am requesting a release ack for this. All 5 issues are bugs with > the handling of error cases, rather than with the basic functionality > provided. With these changes, Coverity is of the opinion that the python > libraries are perfect (0 issues), and I feel this is a worthy position to be > in for 4.5 > > Andrew Cooper (3): > python/xc: Fix multiple issues in pyflask_context_to_sid() > python/xc: Fix multiple issues in pyxc_readconsolering() > python/xs: Correct the indirection of the NULL xshandle() check > > tools/python/xen/lowlevel/xc/xc.c | 34 +- > tools/python/xen/lowlevel/xs/xs.c |2 +- > 2 files changed, 10 insertions(+), 26 deletions(-) > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] fix segfault in xl migrate --debug
On Wed, 2014-11-26 at 21:19 +, Andrew Cooper wrote: > On 26/11/2014 19:54, M A Young wrote: > > > If differences are found during the verification phase of xl migrate > > --debug then it is likely to crash with a segfault because the > > bogus > > pagebuf->pfn_types[pfn] is used in a print statement instead of > > pfn_type[pfn] . > > > > Signed-off-by: Michael Young > > > > > > > > Reviewed-by: Andrew Cooper Acked-by: Ian Campbell Needs a release ack if this is to be for 4.5, Konrad CCd. On the one hand this fixes an issue which is only present if you enable debug/verify mode, so it's not that critical. On the other hand it only touches code which is used if you enable debug/verify mode, so it's not that risky. I'm inclined towards the apply it for 4.5 end of the scale... > > > xl migrate --debug can segfault because pagebuf->pfn_types[pfn] is > > used in a print statement instead of pfn_type[pfn] > > > > --- xen-4.5.0-rc1/tools/libxc/xc_domain_restore.c.orig 2014-10-24 > > 15:22:40.0 +0100 > > +++ xen-4.5.0-rc1/tools/libxc/xc_domain_restore.c 2014-11-25 > > 21:01:16.604081467 + > > @@ -1404,7 +1404,7 @@ > > int v; > > > > DPRINTF("** pfn=%lx type=%lx gotcs=%08lx " > > -"actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn], > > +"actualcs=%08lx\n", pfn, pfn_type[pfn], > > csum_page(region_base + i * PAGE_SIZE), > > csum_page(buf)); > > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-xen-4.5] libxl: Allow copying smaller bitmap into a larger one
On Tue, 2014-11-25 at 10:54 -0500, Konrad Rzeszutek Wilk wrote: > > >>Reported-by: Boris Ostrovsky > > >>Signed-off-by: Wei Liu > > >>Cc: Ian Campbell > > >>Cc: Ian Jackson > > >>Cc: Dario Faggioli > > >> > > >If this end up being the approach, it can have the following: > > > > > >Reviewed-by: Dario Faggioli > > > > Tested-by: Boris Ostrovsky > > Release-Acked-by: Konrad Rzeszutek Wilk Acked +Applied. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] INSTALL: correct EXTRA_CFLAGS handling
On Thu, 2014-11-27 at 10:17 -0500, Konrad Rzeszutek Wilk wrote: > On November 27, 2014 4:26:26 AM EST, Olaf Hering wrote: > >The already documented configure patch was not applied. > >Adjust documentation to describe existing behaviour. > > > >Signed-off-by: Olaf Hering > >Cc: Ian Campbell > >Cc: Ian Jackson > >Cc: Konrad Rzeszutek Wilk > > Reviewed-by: me. > > Don't need an release ack for it. Applied. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] tools/oxenstored: Fix | vs & error in fd event handling
On Wed, 2014-11-26 at 16:09 -0500, Konrad Rzeszutek Wilk wrote: > On Wed, Nov 26, 2014 at 08:44:41PM +, Dave Scott wrote: > > > > > On 26 Nov 2014, at 18:41, Konrad Rzeszutek Wilk > > > wrote: > > > > > > On Wed, Nov 26, 2014 at 06:24:11PM +, Dave Scott wrote: > > >> > > >>> On 26 Nov 2014, at 15:38, Zheng Li wrote: > > >>> > > >>> On 26/11/2014 15:09, Andrew Cooper wrote: > > This makes fields 0 and 1 true more often than they should be, > > resulting > > problems when handling events. > > >>> > > >>> Indeed, looks like a mistake I made when rewriting the logic terms > > >>> lately. The result is POLLUP or POLLERR events being returned in more > > >>> categories than we'd interest. Thanks for fixing this! > > >>> > > >>> Acked-by: Zheng Li > > >> > > >> This also looks fine to me > > >> > > >> Acked-by: David Scott > > > > > > Would it be possible to get an Reviewed-by please? > > > > I’ll certainly offer > > > > Reviewed-by: David Scott > > OK, Release-Acked-by: Konrad Rzeszutek Wilk Applied, thanks. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated
On Tue, 2014-11-25 at 11:03 -0500, Konrad Rzeszutek Wilk wrote: > On Tue, Nov 25, 2014 at 03:58:33PM +, Ian Campbell wrote: > > On Tue, 2014-11-25 at 10:51 -0500, Konrad Rzeszutek Wilk wrote: > > > Right, so with the reassurance text added on: > > > Release-Acked-by: Konrad Rzeszutek Wilk > > > > Thanks. > > > > Chunyan, I propose to commit adding the following to the commit text of > > "[PATCH 1/2 V3] remove domain field in xenstore backend dir": > > > > The correct way to obtain a domain's name is via > > libxl_domid_to_name(), or by reading > > from /local/domain/$DOMID/name for toolstacks not using libxl. > > > > Does that sound OK to you? > > Yes. Thanks, that was really a question for Chunyan, but rather than wait any longer I've applied with that change. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0
On Thu, 2014-11-20 at 14:44 -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Nov 20, 2014 at 03:48:17PM +, Wei Liu wrote: > > On Thu, Nov 20, 2014 at 03:29:51PM +, Ian Campbell wrote: > > > On Wed, 2014-11-19 at 21:24 +, Wei Liu wrote: > > > > On Wed, Nov 19, 2014 at 04:08:46PM -0500, Konrad Rzeszutek Wilk wrote: > > > > > On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote: > > > > > > Before this patch, pv guest video_memkb is -1, which is an invalid > > > > > > value. > > > > > > And it will cause the xenstore 'memory/targe' calculation wrong: > > > > > > > > > > > > memory/target = info->target_memkb - info->video_memkb > > > > > > > > > > CC-ing the maintainers. > > > > > > > > > > Is this an regression as compared to Xen 4.4 or is this also in Xen > > > > > 4.4? > > > > > > > > > > > > > I don't think this is a regression, it has been broken for quite a > > > > while. > > > > > > I think so to. > > > > > > On the other hand its a pretty clear bug to use video_memkb == -1 and > > > we've seen that it causes real issues. The fix is also fairly obvious. > > > I'm inclined towards suggesting we fix this in 4.5. > > > > > > > I concur. > > Lets do it then. RElease-Acked-by: Konrad Rzeszutek Wilk > APplied, thanks. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 1/3] python/xc: Fix multiple issues in pyflask_context_to_sid()
On Fri, 2014-11-28 at 11:47 +, Andrew Cooper wrote: > On 28/11/14 11:37, Ian Campbell wrote: > > On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: > >> The error handling from a failed memory allocation should return > >> PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and > >> continuing > >> to the memcpy() below, with the dest pointer being NULL. > >> > >> Furthermore, the context string is simply an input parameter to the > >> hypercall, > >> and is not mutated anywhere along the way. The error handling elsewhere in > >> the function can be simplified by not duplicating it to start with. > >> > >> Signed-off-by: Andrew Cooper > >> Coverity-IDs: 1055305 1055721 > >> CC: Ian Campbell > >> CC: Ian Jackson > >> CC: Wei Liu > >> CC: Xen Coverity Team > > Acked-by: Ian Campbell > > > > This would have been far more obviously correct for 4.5 if you had stuck > > to fixing the issue in the first paragraph. > > Hmm - I appear to have missed a detail in the description. One of the > CIDs is to do with putting a string in a new buffer without a NUL > terminator, and passing it as a char* into xc_flask_context_to_sid; the > issue being that anyone expecting things like strlen() to work will be > in for a nasty shock. ISTR a discussion of this interface in Julien's device-tree passthrough thing a while back and some sort of conclusion that the hypervisor was supposed to use the len field and not rely on the NULL. I'm not sure that necessarily invalidates what you are saying, since even given that throwing a NULL on the end would be friendly to libxc consumers if nothing else. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xenstore: Clarify xs_open() semantics
Added to the xs_open() comments in xenstore.h. The text has been taken almost verbatim from a xen-devel email by Ian Campbell, and confirmed as accurate by Ian Jackson. Signed-off-by: Razvan Cojocaru Suggested-off-by: Ian Campbell --- tools/xenstore/include/xenstore.h | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/xenstore/include/xenstore.h b/tools/xenstore/include/xenstore.h index fdf5e76..b4b113e 100644 --- a/tools/xenstore/include/xenstore.h +++ b/tools/xenstore/include/xenstore.h @@ -59,10 +59,20 @@ typedef uint32_t xs_transaction_t; /* On failure, these routines set errno. */ /* Open a connection to the xs daemon. - * Attempts to make a connection over the socket interface, + * Attempts to make a connection over the socket interface, * and if it fails, then over the xenbus interface. * Mode 0 specifies read-write access, XS_OPEN_READONLY for * read-only access. + * + * * Connections made with xs_open(0) (which might be shared page or + * socket based) are only guaranteed to work in the parent after + * fork. + * * Connections made with xs_open(XS_OPEN_SOCKETONLY) will be usable + * in either the parent or the child after fork, but not both. + * * xs_daemon_open*() and xs_domain_open() are deprecated synonyms + * for xs_open(0). + * * XS_OPEN_READONLY has no bearing on any of this. + * * Returns a handle or NULL. */ struct xs_handle *xs_open(unsigned long flags); -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 0/4] Introduction of the patches.
longtao.pang writes ("[OSSTEST PATCH 0/4] Introduction of the patches."): > We updated these patchs about adding Nested test job into OSSTest. Thanks for your contribution. Having some testing of nested HVM would be good. But I'm not convinced that these patches take the right approach to achieving that. There seems to be a great deal of duplication of code. I think we should have a conversation about what moving parts are necessary for nested HVM testing. I would guess that you could reuse ts-debian-hvm-install for the initial install of the L1 guest, and then ts-xen-install to install Xen on it. Finally, you could run some other ts-* scripts to install your L2 guests. I think you would probably find that there are only some small changes needed to make those existing scripts flexible enough. And I don't understand why you need to rebuild anything. Surely the already-built hypervisor and tools ought to work just as well in the L1 guest ? Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xenstore: Clarify xs_open() semantics
Razvan Cojocaru writes ("[PATCH] xenstore: Clarify xs_open() semantics"): > Added to the xs_open() comments in xenstore.h. The text has been > taken almost verbatim from a xen-devel email by Ian Campbell, > and confirmed as accurate by Ian Jackson. > > Signed-off-by: Razvan Cojocaru > Suggested-off-by: Ian Campbell Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: > I propose to reword the commit message into: You'll want to change the code comment too I think. > > xen/arm: Handle platforms with edge-triggered virtual timer > > Some platforms (such as the ARMv8 model) uses edge-triggered interrupt > for the virtual timer. Even if the timer output signal is masked in the > context switch, the GIC will keep track that of any raised interrupt > while the IRQs has been disabled. As soon as the IRQs are re-enabled, > the virtual interrupt timer will be injected to Xen. > > The interrupt handler doesn't except to the receive the IRQ and will > crash if an idle vCPU is running: > > (XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC) > (XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR) > (XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0 > (XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54 > (XEN)[<00247010>] do_IRQ+0x1a4/0x220 > (XEN)[<00244864>] gic_interrupt+0x50/0xec > (XEN)[<0024fbac>] do_trap_irq+0x20/0x2c > (XEN)[<00255240>] hyp_irq+0x5c/0x60 > (XEN)[<00241084>] context_switch+0xb8/0xc4 > (XEN)[<0022482c>] schedule+0x684/0x6d0 > (XEN)[<0022785c>] __do_softirq+0xcc/0xe8 > (XEN)[<002278d4>] do_softirq+0x14/0x1c > (XEN)[<00240fac>] idle_loop+0x134/0x154 > (XEN)[<0024c160>] start_secondary+0x14c/0x15c > (XEN)[<0001>] 0001 > > The proper solution would be context/switching the virtual interrupt > state at the GIC level. This would also avoid masking the output signal > and requires specific handling in the guest OS. > > Sadly, this solution require some refactoring which would miss the Xen > 4.5 release. > > For now implement a temporary solution which ignore the interrupt when > the idle VCPU is running. > > Regards, > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] libxl: events: Assert that libxl_ctx_free is not called from a hook
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > No-one in their right mind would do this, and if they did everything > would definitely collapse. Arrange that if this happens, we crash > ASAP. > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell > --- > tools/libxl/libxl.c |2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index de23fec..c111f27 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -148,6 +148,8 @@ int libxl_ctx_free(libxl_ctx *ctx) > { > if (!ctx) return 0; > > +assert(!ctx->osevent_in_hook); > + > int i; > GC_INIT(ctx); > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [seabios test] 31885: tolerable FAIL - PUSHED
flight 31885 seabios real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/31885/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-sedf-pin 7 debian-install fail pass in 31857 test-amd64-amd64-xl-pcipt-intel 5 xen-bootfail in 31857 pass in 31885 test-amd64-i386-pair 7 xen-boot/src_host fail in 31857 pass in 31885 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31849 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 9 guest-start fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-libvirt 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass version targeted for testing: seabios b7f4a76a929ce4acd60e89aa273a8b208daa8233 baseline version: seabios 9f505f715793d99235bd6b4afb2ca7b96ba5729b People who touched revisions under test: Gerd Hoffmann jobs: build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64fail test-amd64-i386-xl-credit2 pass test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt fail test-amd64-i386-libvirt fail test-amd64-i386-xl-multivcpu pass test-amd64-amd64-pairpass test-amd64-i386-pair fail test-amd64-amd64-xl-sedf-pin fail test-amd64-amd64-xl-sedf
Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > We want to have no fd events registered when we are idle. This > implies that we must be able to deregister our interest in the sigchld > self-pipe fd, not just modify to request no events. > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell I think in principal there is now some redundant code in libxl__sigchld_needed which calls modify to set POLLIN if the fd is not registered. I think it's redundant because now the fd is registered iff it is looking for POLLIN. > --- > tools/libxl/libxl_fork.c |9 + > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c > index fa15095..144208a 100644 > --- a/tools/libxl/libxl_fork.c > +++ b/tools/libxl/libxl_fork.c > @@ -372,15 +372,8 @@ static void sigchld_user_remove(libxl_ctx *ctx) /* > idempotent */ > > void libxl__sigchld_notneeded(libxl__gc *gc) /* non-reentrant, idempotent */ > { > -int rc; > - > sigchld_user_remove(CTX); > - > -if (libxl__ev_fd_isregistered(&CTX->sigchld_selfpipe_efd)) { > -rc = libxl__ev_fd_modify(gc, &CTX->sigchld_selfpipe_efd, 0); > -if (rc) > -libxl__ev_fd_deregister(gc, &CTX->sigchld_selfpipe_efd); > -} > +libxl__ev_fd_deregister(gc, &CTX->sigchld_selfpipe_efd); > } > > int libxl__sigchld_needed(libxl__gc *gc) /* non-reentrant, idempotent */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
Hi Ian, On 28/11/14 12:32, Ian Campbell wrote: > On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: >> I propose to reword the commit message into: > > You'll want to change the code comment too I think. It was my plan. I will send a new version during the day. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
Hi Ian, On 28/11/14 11:47, Ian Campbell wrote: > On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: >> state at the GIC level. This would also avoid masking the output signal >> and requires specific handling in the guest OS. > > "which requires"? > > It doesn't seem quite right to me otherwise, since context switching the > virq state *removes* the need to have the guest do anything other than > what it would do on native. I though the "avoid" would apply for both "masking" and "requires". > Assuming this is what you meant I propose (fixing some grammar etc as I > go): Thanks for the correction, I will use this version. Shall I put your signed-off-by? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/6] libxl: events: Tear down SIGCHLD machinery on ctx destruction
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > We want to have no fd events registered when we are idle. > Also, we should put back the default SIGCHLD handler. So: > > * In libxl_ctx_free, use libxl_childproc_setmode to set the mode to >the default, which is libxl_sigchld_owner_libxl (ie `libxl owns >SIGCHLD only when it has active children'). > >But of course there are no active children at libxl teardown so >this results in libxl__sigchld_notneeded: the ctx loses its >interest in SIGCHLD (unsetting the SIGCHLD handler if we were the >last ctx) and deregisters the per-ctx selfpipe fd. > > * assert that this is the case: ie that we are no longer interested >in the selfpipe. > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell > --- > tools/libxl/libxl.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 12a1013..785253d 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -162,11 +162,12 @@ int libxl_ctx_free(libxl_ctx *ctx) > while ((eject = LIBXL_LIST_FIRST(&CTX->disk_eject_evgens))) > libxl__evdisable_disk_eject(gc, eject); > > +libxl_childproc_setmode(CTX,0,0); > for (i = 0; i < ctx->watch_nslots; i++) > assert(!libxl__watch_slot_contents(gc, i)); > assert(!libxl__ev_fd_isregistered(&ctx->watch_efd)); > libxl__ev_fd_deregister(gc, &ctx->evtchn_efd); > -libxl__ev_fd_deregister(gc, &ctx->sigchld_selfpipe_efd); > +assert(!libxl__ev_fd_isregistered(&ctx->sigchld_selfpipe_efd)); > > /* Now there should be no more events requested from the application: */ > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC 7/7] tools/tests: Remove superfluous and incomplete spinlock from xen-access
On 11/12/2014 05:31 PM, Tamas K Lengyel wrote: > The spin-lock implementation in the xen-access test program is implemented > in a fashion that is actually incomplete. The x86 assembly that guarantees > that > the lock is held by only one thread lacks the "lock;" instruction. > > However, the spin-lock is not actually necessary in xen-access as it is not > multithreaded. The presence of the faulty implementation of the lock in a non- > mulithreaded environment is unnecessarily complicated for developers who are > trying to follow this code as a guide in implementing their own applications. > Thus, removing it from the code improves the clarity on the behavior of the > system. > > Signed-off-by: Tamas K Lengyel > --- > tools/tests/xen-access/xen-access.c | 68 > - > 1 file changed, 6 insertions(+), 62 deletions(-) > > diff --git a/tools/tests/xen-access/xen-access.c > b/tools/tests/xen-access/xen-access.c > index 3538323..428c459 100644 > --- a/tools/tests/xen-access/xen-access.c > +++ b/tools/tests/xen-access/xen-access.c > @@ -45,56 +45,6 @@ > #define ERROR(a, b...) fprintf(stderr, a "\n", ## b) > #define PERROR(a, b...) fprintf(stderr, a ": %s\n", ## b, strerror(errno)) > > -/* Spinlock and mem event definitions */ > - > -#define SPIN_LOCK_UNLOCKED 0 > - > -#define ADDR (*(volatile long *) addr) > -/** > - * test_and_set_bit - Set a bit and return its old value > - * @nr: Bit to set > - * @addr: Address to count from > - * > - * This operation is atomic and cannot be reordered. > - * It also implies a memory barrier. > - */ > -static inline int test_and_set_bit(int nr, volatile void *addr) > -{ > -int oldbit; > - > -asm volatile ( > -"btsl %2,%1\n\tsbbl %0,%0" > -: "=r" (oldbit), "=m" (ADDR) > -: "Ir" (nr), "m" (ADDR) : "memory"); > -return oldbit; > -} > - > -typedef int spinlock_t; > - > -static inline void spin_lock(spinlock_t *lock) > -{ > -while ( test_and_set_bit(1, lock) ); > -} > - > -static inline void spin_lock_init(spinlock_t *lock) > -{ > -*lock = SPIN_LOCK_UNLOCKED; > -} > - > -static inline void spin_unlock(spinlock_t *lock) > -{ > -*lock = SPIN_LOCK_UNLOCKED; > -} > - > -static inline int spin_trylock(spinlock_t *lock) > -{ > -return !test_and_set_bit(1, lock); > -} > - > -#define vm_event_ring_lock_init(_m) spin_lock_init(&(_m)->ring_lock) > -#define vm_event_ring_lock(_m) spin_lock(&(_m)->ring_lock) > -#define vm_event_ring_unlock(_m) spin_unlock(&(_m)->ring_lock) > - > typedef struct vm_event { > domid_t domain_id; > xc_evtchn *xce_handle; > @@ -102,7 +52,6 @@ typedef struct vm_event { > vm_event_back_ring_t back_ring; > uint32_t evtchn_port; > void *ring_page; > -spinlock_t ring_lock; > } vm_event_t; > > typedef struct xenaccess { > @@ -241,9 +190,6 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t > domain_id) > /* Set domain id */ > xenaccess->vm_event.domain_id = domain_id; > > -/* Initialise lock */ > -vm_event_ring_lock_init(&xenaccess->vm_event); > - > /* Enable mem_access */ > xenaccess->vm_event.ring_page = > xc_mem_access_enable(xenaccess->xc_handle, > @@ -320,13 +266,14 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, > domid_t domain_id) > return NULL; > } > > +/* > + * Note that this function is not thread safe. > + */ > int get_request(vm_event_t *vm_event, vm_event_request_t *req) > { > vm_event_back_ring_t *back_ring; > RING_IDX req_cons; > > -vm_event_ring_lock(vm_event); > - > back_ring = &vm_event->back_ring; > req_cons = back_ring->req_cons; > > @@ -338,18 +285,17 @@ int get_request(vm_event_t *vm_event, > vm_event_request_t *req) > back_ring->req_cons = req_cons; > back_ring->sring->req_event = req_cons + 1; > > -vm_event_ring_unlock(vm_event); > - > return 0; > } > > +/* > + * Note that this function is not thread safe. > + */ > static int put_response(vm_event_t *vm_event, vm_event_response_t *rsp) > { > vm_event_back_ring_t *back_ring; > RING_IDX rsp_prod; > > -vm_event_ring_lock(vm_event); > - > back_ring = &vm_event->back_ring; > rsp_prod = back_ring->rsp_prod_pvt; > > @@ -361,8 +307,6 @@ static int put_response(vm_event_t *vm_event, > vm_event_response_t *rsp) > back_ring->rsp_prod_pvt = rsp_prod; > RING_PUSH_RESPONSES(back_ring); > > -vm_event_ring_unlock(vm_event); > - > return 0; > } I've just now noticed that get_request() and put_response() only ever return 0, so it makes no sense to check the return values later on, and they should basically either be modified to be void functions or some error checking should be added (not sure what that could be). Regards, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] libxl: events: Document and enforce actual callbacks restriction
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > libxl_event_register_callbacks cannot reasonably be called while libxl > is busy (has outstanding operations and/or enabled events). > > This is because the previous spec implied (although not entirely > clearly) that event hooks would not be called for existing fd and > timeout interests. There is thus no way to reliably ensure that libxl > would get told about fds and timeouts which it became interested in > beforehand. > > So there have to be no such fds or timeouts, which means that the > callbacks must only be registered or changed when the ctx is idle. > > Document this restriction, and enforce it with a pair of asserts. > > (It would be nicer, perhaps, to say that the application may not call > libxl_osevent_register_hooks other than right after creating the ctx. > But there are existing callers, including libvirt, who do it later - > even after doing major operations such as domain creation.) > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > We want to have no fd events registered when we are idle. > In this patch, deal with the evtchn fd: > > * Defer setup of the evtchn handle to the first use. > * Defer registration of the evtchn fd; register as needed on use. > * When cancelling an evtchn wait, or when wait setup fails, check >whether there are now no evtchn waits and if so deregister the fd. > * On libxl teardown, the evtchn fd should therefore be unregistered. >assert that this is the case. Is there no locking required when registering/deregistering? Since there are list manipulations in most of these places I presume it already exists, but I thought I should check. > > Signed-off-by: Ian Jackson > --- > tools/libxl/libxl.c |4 +--- > tools/libxl/libxl_event.c| 27 +++ > tools/libxl/libxl_internal.h |1 + > 3 files changed, 21 insertions(+), 11 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 785253d..e0db4eb 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -118,8 +118,6 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version, > rc = ERROR_FAIL; goto out; > } > > -rc = libxl__ctx_evtchn_init(gc); > - > *pctx = ctx; > return 0; > > @@ -166,7 +164,7 @@ int libxl_ctx_free(libxl_ctx *ctx) > for (i = 0; i < ctx->watch_nslots; i++) > assert(!libxl__watch_slot_contents(gc, i)); > assert(!libxl__ev_fd_isregistered(&ctx->watch_efd)); > -libxl__ev_fd_deregister(gc, &ctx->evtchn_efd); > +assert(!libxl__ev_fd_isregistered(&ctx->evtchn_efd)); > assert(!libxl__ev_fd_isregistered(&ctx->sigchld_selfpipe_efd)); > > /* Now there should be no more events requested from the application: */ > diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c > index da0a20e..716f318 100644 > --- a/tools/libxl/libxl_event.c > +++ b/tools/libxl/libxl_event.c > @@ -721,7 +721,7 @@ static void evtchn_fd_callback(libxl__egc *egc, > libxl__ev_fd *ev, > > int libxl__ctx_evtchn_init(libxl__gc *gc) { > xc_evtchn *xce; > -int rc, fd; > +int rc; > > if (CTX->xce) > return 0; > @@ -733,14 +733,10 @@ int libxl__ctx_evtchn_init(libxl__gc *gc) { > goto out; > } > > -fd = xc_evtchn_fd(xce); > -assert(fd >= 0); > +CTX->evtchn_fd = xc_evtchn_fd(xce); > +assert(CTX->evtchn_fd >= 0); Given that you can always retrieve this no demand with xc_evtchn_fd(xce) and that it is cheap do you need to stash it in the ctx? > -rc = libxl_fd_set_nonblock(CTX, fd, 1); > -if (rc) goto out; > - > -rc = libxl__ev_fd_register(gc, &CTX->evtchn_efd, > - evtchn_fd_callback, fd, POLLIN); > +rc = libxl_fd_set_nonblock(CTX, CTX->evtchn_fd, 1); > if (rc) goto out; > > CTX->xce = xce; > @@ -751,6 +747,12 @@ int libxl__ctx_evtchn_init(libxl__gc *gc) { > return rc; > } > > +static void evtchn_check_fd_deregister(libxl__gc *gc) > +{ > +if (CTX->xce && LIBXL_LIST_EMPTY(&CTX->evtchns_waiting)) > +libxl__ev_fd_deregister(gc, &CTX->evtchn_efd); > +} > + > int libxl__ev_evtchn_wait(libxl__gc *gc, libxl__ev_evtchn *evev) > { > int r, rc; > @@ -758,6 +760,13 @@ int libxl__ev_evtchn_wait(libxl__gc *gc, > libxl__ev_evtchn *evev) > DBG("ev_evtchn=%p port=%d wait (was waiting=%d)", > evev, evev->port, evev->waiting); > > +rc = libxl__ctx_evtchn_init(gc); > +if (rc) goto out; > + > +rc = libxl__ev_fd_register(gc, &CTX->evtchn_efd, > + evtchn_fd_callback, CTX->evtchn_fd, POLLIN); > +if (rc) goto out; Do you not need to do this only if evtchns_waiting is currently empty or the efd is idle? > + > if (evev->waiting) > return 0; If you hit this you leave the stuff above done. Which may or may not matter depending on the answer above. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 0/6] libxl: events: Tear down fd interests when idle
On Thu, 2014-11-27 at 18:30 +, Ian Jackson wrote: > Konrad: here is a set of fixes for libxl which ought IMO to be in 4.5. > See below, or the rest of the thread, for details. It's broken up > into 6 tiny patches for ease of review. I tested an identical version of this series, just without the commit logs, with libvirt on ARM. So in addition to my various acks: Tested-by: Ian Campbell ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > * On libxl teardown, the watch fd should therefore be unregistered. > assert that this is the case. A bunch of the patches in this series basically assume that the ctx is idle when it is freed, i.e. it requires everything to be explicitly cancelled rather than implicitly doing so on free. I think this is a fine restriction, but it probably ought to be written down. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-4.5 HVMOP ABI issues
>>> On 28.11.14 at 12:36, wrote: > However, it occurs to me that HVMOP_op_mask absolutely must live in the > public header files, as it is guest visible, and forms part of the ABI. > > Consider a userspace task which falls into a continuation, is preempted > and paused by the guest kernel, the entire VM migrated, and the task > unpaused later. If HVMOP_op_mask has changed in that time, the > continuation will become erroneous. > > In particular, all hypercall continuation strategies we use *must* be > forward compatible when moving to newer versions of Xen, because Xen > cannot possibly guarantee that a continuation which starts will finish > on the same hypervisor. Hmm, I see the issue, but surfacing such implementation details would not only be unfortunate, but eventually prevent us from making future changes. Just recall the mem-op case where we had to widen the continuation encoding mask at some point. Hence while I can't immediately see a way to avoid the situation you describe (and it doesn't even take a user space task in a preemptible kernel), I think we should allow ourselves a little more time to find possible solutions other than locking down these masks (really they don't need to be exposed in the public headers, but we would need them to not change if no other solution can be found). One thing to note is that the _introduction_ of such a mask (as has happened for hvm-op between 4.4 and 4.5) is not a problem as long as the respective bits all being zero has no special meaning. What I considered for mem-op a while ago though (and then forgot again) was to refuse non-zero start_extent values for any operations not making use of that mechanism. The same would of course be applicable to gnttab-op and hvm-op. What might additionally make this not that urgent an issue for 4.5 is that only XSM_DM_PRIV guarded operations are affected, and I suppose a stubdom gets re-created on the target host (rather than migrated) when its client migrates. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Fri, 2014-11-28 at 12:48 +, Julien Grall wrote: > Hi Ian, > > On 28/11/14 11:47, Ian Campbell wrote: > > On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: > >> state at the GIC level. This would also avoid masking the output signal > >> and requires specific handling in the guest OS. > > > > "which requires"? > > > > It doesn't seem quite right to me otherwise, since context switching the > > virq state *removes* the need to have the guest do anything other than > > what it would do on native. > > I though the "avoid" would apply for both "masking" and "requires". I think it reads with the avoid binding tightly to the masking only. Possibly s/requires/requiring/ would have also corrected the meaning to what you intended, although I would have changed the "and" to "or" as well to make it less ambiguous. > > Assuming this is what you meant I propose (fixing some grammar etc as I > > go): > > > Thanks for the correction, I will use this version. Shall I put your > signed-off-by? I don't think that's needed, its was pretty small. (i.e. I wouldn't have added my S-o-b if I did it on commit). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-4.5 HVMOP ABI issues
On Fri, 2014-11-28 at 13:07 +, Jan Beulich wrote: > I suppose a stubdom gets re-created on the target host (rather > than migrated) when its client migrates. Correct. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed
On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > We want to have no fd events registered when we are idle. > In this patch, deal with the xenstore watch fd: > > * Track the total number of active watches. > * When deregistering a watch, or when watch registration fails, check > whether there are now no watches and if so deregister the fd. > * On libxl teardown, the watch fd should therefore be unregistered. > assert that this is the case. > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-4.5 HVMOP ABI issues
On 28/11/14 13:07, Jan Beulich wrote: On 28.11.14 at 12:36, wrote: >> However, it occurs to me that HVMOP_op_mask absolutely must live in the >> public header files, as it is guest visible, and forms part of the ABI. >> >> Consider a userspace task which falls into a continuation, is preempted >> and paused by the guest kernel, the entire VM migrated, and the task >> unpaused later. If HVMOP_op_mask has changed in that time, the >> continuation will become erroneous. >> >> In particular, all hypercall continuation strategies we use *must* be >> forward compatible when moving to newer versions of Xen, because Xen >> cannot possibly guarantee that a continuation which starts will finish >> on the same hypervisor. > Hmm, I see the issue, but surfacing such implementation details > would not only be unfortunate, but eventually prevent us from > making future changes. Just recall the mem-op case where we had > to widen the continuation encoding mask at some point. Hence while > I can't immediately see a way to avoid the situation you describe > (and it doesn't even take a user space task in a preemptible kernel), > I think we should allow ourselves a little more time to find possible > solutions other than locking down these masks (really they don't > need to be exposed in the public headers, but we would need > them to not change if no other solution can be found). It is certainly unfortunate, but I don't see that we have any choice. The implementation details of continuations have already slipped into the ABI. > > One thing to note is that the _introduction_ of such a mask (as > has happened for hvm-op between 4.4 and 4.5) is not a problem > as long as the respective bits all being zero has no special > meaning. What I considered for mem-op a while ago though (and > then forgot again) was to refuse non-zero start_extent values > for any operations not making use of that mechanism. The same > would of course be applicable to gnttab-op and hvm-op. > > What might additionally make this not that urgent an issue for 4.5 > is that only XSM_DM_PRIV guarded operations are affected, and > I suppose a stubdom gets re-created on the target host (rather > than migrated) when its client migrates. The problem is with continuations which reuse the upper bits of the input register, not with this HVMOP_op_mask specifically; the HVMOP_op_mask simply adds to an existing problem. This is something which needs considering urgently because, as you identify above, we have already suffered an accidental ABI breakage with the mem-op widening. 32bit HVM guests reliably form a continuation on every single iteration of a continuable hypercall (e.g. decrease reservation, which is the base of my XSA-111 PoC), making it trivial to construct a migrateable HVM domain which exposes the issue. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2] Add support for Xen ARM guest on FreeBSD
On Sun, 23 Nov 2014 22:35:36 + Julien Grall wrote: > Hello all, > > At the beginning of the year, I have sent a first RFC to add support > for FreeBSD on Xen ARM [1]. ... > Major changes in this new version: > * Add Device Tree support via Linux Boot ABI > * Add zImage support > * Netfront support > * Blkfront fixes > * DOM0 support (separate branch see below) > > The former item is very hackish. I was wondering if there is another > way to do it? Or maybe we should support FreeBSD Bootloader in ARM > guest? I think using the loader is the correct way to handle booting in Xen. It allows us to relocate the dtb as required. It look like a zImage then use the Xen console to interact with the user. > > The patch series is divided in X parts: > * #1 - #14: Clean up and bug fixes for Xen. They can be > applied without the rest of the series > * #15 - #19: Update Xen interface to 4.4 and fix > compilation. It's required for ARM. > * #20 - #26: Update Xen code to support ARM > * #27 - #33: Rework the event channel code for supporting > ARM. I will work with Royger to come with a common interface with x86 > * #34 - #36: Add support for ARM in Xen code > * #37 - #46: ARM bug fixes and new features. Some of thoses > patches (#37 - #40) could be applied without the rest of the series > * #47 - #48: Add Xen ARM platform I have committed patches 30 and 40 as they look good. I'm not familiar with the code to review 37 or 38, however from my quick look at 38 I appears _bus_dmamap_load_buffer does take in to account buflen and dmat->maxsegsz when setting sgsize just not dmat->alignment. ... > > TODO: > * Add SMP/PSCI support in FreeBSD. Could be useful other > platform too Adding PSCI support is on my TODO lost for arm64, however I don't expect to get on ti in until early next year. > * Only FreeBSD to load anywhere. Currently there is a 2M > alignment which require a patch in Xen. If you use the loader this will be fixed as the loader can load the kernel at the correct alignment. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd
Ian Campbell writes ("Re: [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd"): > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > We want to have no fd events registered when we are idle. This > > implies that we must be able to deregister our interest in the sigchld > > self-pipe fd, not just modify to request no events. > > > > Signed-off-by: Ian Jackson > > Acked-by: Ian Campbell > > I think in principal there is now some redundant code in > libxl__sigchld_needed which calls modify to set POLLIN if the fd is not > registered. I think it's redundant because now the fd is registered iff > it is looking for POLLIN. You're right. (Although you mean `calls modify if the fd /is/ registered'.) I wonder if it would be better to leave tidying that up until post-4.5, in case we are wrong. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/6] libxl: events: Deregister, don't just modify, sigchld pipe fd
On Fri, 2014-11-28 at 14:42 +, Ian Jackson wrote: > Ian Campbell writes ("Re: [PATCH 3/6] libxl: events: Deregister, don't just > modify, sigchld pipe fd"): > > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > > We want to have no fd events registered when we are idle. This > > > implies that we must be able to deregister our interest in the sigchld > > > self-pipe fd, not just modify to request no events. > > > > > > Signed-off-by: Ian Jackson > > > > Acked-by: Ian Campbell > > > > I think in principal there is now some redundant code in > > libxl__sigchld_needed which calls modify to set POLLIN if the fd is not > > registered. I think it's redundant because now the fd is registered iff > > it is looking for POLLIN. > > You're right. (Although you mean `calls modify if the fd /is/ > registered'.) Indeed. > I wonder if it would be better to leave tidying that up until > post-4.5, in case we are wrong. Sure. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed
Ian Campbell writes ("Re: [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed"): > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > We want to have no fd events registered when we are idle. > > In this patch, deal with the evtchn fd: > > > > * Defer setup of the evtchn handle to the first use. > > * Defer registration of the evtchn fd; register as needed on use. > > * When cancelling an evtchn wait, or when wait setup fails, check > >whether there are now no evtchn waits and if so deregister the fd. > > * On libxl teardown, the evtchn fd should therefore be unregistered. > >assert that this is the case. > > Is there no locking required when registering/deregistering? Since there > are list manipulations in most of these places I presume it already > exists, but I thought I should check. libxl__ev_evtchn_* is always called with the ctx lock held. However, that they don't take the lock is contrary to the rules stated for libxl__ev_* in the doc comment. That should be fixed for 4.6. libxl__ev_fd_* already take and release the lock to protect their own data structures etc. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/6] libxl: events: Deregister evtchn fd when not needed
On Fri, 2014-11-28 at 14:47 +, Ian Jackson wrote: > Ian Campbell writes ("Re: [PATCH 5/6] libxl: events: Deregister evtchn fd > when not needed"): > > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > > We want to have no fd events registered when we are idle. > > > In this patch, deal with the evtchn fd: > > > > > > * Defer setup of the evtchn handle to the first use. > > > * Defer registration of the evtchn fd; register as needed on use. > > > * When cancelling an evtchn wait, or when wait setup fails, check > > >whether there are now no evtchn waits and if so deregister the fd. > > > * On libxl teardown, the evtchn fd should therefore be unregistered. > > >assert that this is the case. > > > > Is there no locking required when registering/deregistering? Since there > > are list manipulations in most of these places I presume it already > > exists, but I thought I should check. > > libxl__ev_evtchn_* is always called with the ctx lock held. For the most part this is implicit due to the caller being in an ao callback, right? > However, that they don't take the lock is contrary to the rules stated > for libxl__ev_* in the doc comment. That should be fixed for 4.6. OK. > libxl__ev_fd_* already take and release the lock to protect their own > data structures etc. > > Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed
Ian Campbell writes ("Re: [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed"): > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > * On libxl teardown, the watch fd should therefore be unregistered. > > assert that this is the case. > > A bunch of the patches in this series basically assume that the ctx is > idle when it is freed, i.e. it requires everything to be explicitly > cancelled rather than implicitly doing so on free. libxl_ctx_free explicitly disables all the application-requested event generators. (free_disable_deaths and libxl__evdisable_disk_eject.) Destroying the ctx during the execution of an asynchronous operation is forbidden by this text in libxl.h (near line 813): * *ao_how does not need to remain valid after the initiating function * returns. All other parameters must remain valid for the lifetime of * the asynchronous operation, unless otherwise specified. That implies that the ctx must remain valid during the ao, so it may not be destroyed beforehand. Those are the two ways that, even without any threads inside libxl, a ctx can be other than idle. It should be obvious to the application programmer that destroying the ctx when there are other threads inside libxl is not going to work. Indeed a programmer who tries to destroy the ctx when they have threads which might be inside libxl cannot ensure that the ctx is valid even on entry to libxl. > I think this is a fine restriction, but it probably ought to be written > down. Does the above demonstrate that the existing restrictions are documented ? I'd rather avoid writing the restrictions twice if it can be avoided - these docs are long enough as they are. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: events: Deregister xenstore watch fd when not needed
On Fri, 2014-11-28 at 14:56 +, Ian Jackson wrote: > Ian Campbell writes ("Re: [PATCH 2/6] libxl: events: Deregister xenstore > watch fd when not needed"): > > On Thu, 2014-11-27 at 18:27 +, Ian Jackson wrote: > > > * On libxl teardown, the watch fd should therefore be unregistered. > > > assert that this is the case. > > > > A bunch of the patches in this series basically assume that the ctx is > > idle when it is freed, i.e. it requires everything to be explicitly > > cancelled rather than implicitly doing so on free. > > libxl_ctx_free explicitly disables all the application-requested event > generators. (free_disable_deaths and libxl__evdisable_disk_eject.) So it does, I was looking for that before commenting but didn't see those calls for what they were, despite the comment right there... > Destroying the ctx during the execution of an asynchronous operation > is forbidden by this text in libxl.h (near line 813): > * *ao_how does not need to remain valid after the initiating function > * returns. All other parameters must remain valid for the lifetime of > * the asynchronous operation, unless otherwise specified. > That implies that the ctx must remain valid during the ao, so it may > not be destroyed beforehand. > > Those are the two ways that, even without any threads inside libxl, a > ctx can be other than idle. > > It should be obvious to the application programmer that destroying the > ctx when there are other threads inside libxl is not going to work. > Indeed a programmer who tries to destroy the ctx when they have > threads which might be inside libxl cannot ensure that the ctx is > valid even on entry to libxl. > > > I think this is a fine restriction, but it probably ought to be written > > down. > > Does the above demonstrate that the existing restrictions are > documented ? Yes. > I'd rather avoid writing the restrictions twice if it > can be avoided - these docs are long enough as they are. Indeed. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
create ^ title it QMP connection problems prevent libxl from calling libxl__device_pci_reset on domain shutdown thanks On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > Hi, > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > libxl__device_pci_reset()" > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci > passthrough. > xl didn't show any problems on the commandline so i never drawed much > attention > to it, but /var/log/xen/xl-guest.log shows: > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > Domain 19 has shut down, reason code 0 0x0 > Action for shutdown reason code 0 is destroy > Domain 19 needs to be cleaned up: destroying the domain > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection > reset by peer > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to connect to > QMP > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: > Connection reset by peer > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already > exited > Done. Exiting now > > So it doesn't even get to calling "libxl_pci.c: libxl__device_pci_reset()". > > -- > Sander > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer
Some platforms (such as Xgene and ARMv8 models) use an edge-triggered interrupt for the virtual timer. Even if the timer output signal is masked in the context switch, the GIC will keep track that of any interrupts raised while IRQs are disabled. As soon as IRQs are re-enabled, the virtual interrupt timer will be injected to Xen. If an idle vVCPU was scheduled next then the interrupt handler doesn't expect to the receive the IRQ and will crash: (XEN)[<00228388>] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[<00228380>] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[<00250510>] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[<0024bcd0>] vtimer_interrupt+0x4c/0x54 (XEN)[<00247010>] do_IRQ+0x1a4/0x220 (XEN)[<00244864>] gic_interrupt+0x50/0xec (XEN)[<0024fbac>] do_trap_irq+0x20/0x2c (XEN)[<00255240>] hyp_irq+0x5c/0x60 (XEN)[<00241084>] context_switch+0xb8/0xc4 (XEN)[<0022482c>] schedule+0x684/0x6d0 (XEN)[<0022785c>] __do_softirq+0xcc/0xe8 (XEN)[<002278d4>] do_softirq+0x14/0x1c (XEN)[<00240fac>] idle_loop+0x134/0x154 (XEN)[<0024c160>] start_secondary+0x14c/0x15c (XEN)[<0001>] 0001 The proper solution is to context switch the virtual interrupt state at the GIC level. This would also avoid masking the output signal which requires specific handling in the guest OS and more complex code in Xen to deal with EOIs, and so is desirable for that reason too. Sadly, this solution requires some refactoring which would not be suitable for a freeze exception for the Xen 4.5 release. For now implement a temporary solution which ignores the virtual timer interrupt when the idle VCPU is running. Signed-off-by: Julien Grall --- Changes in v2: - Reword the commit message and comment in the code to explain the real bug. Based on Ian's reword. - Use unlikely This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4. It affects at least Xgene platform and ARMv8 models where Xen may randomly crash. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. For now, I think this patch is the safest way to resolve the problem. I will work on a proper solution for Xen 4.6. --- xen/arch/arm/time.c | 13 + 1 file changed, 13 insertions(+) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..471d7a9 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,19 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { +/* + * Edge-triggered interrupt can be used for the virtual timer. Even + * if the timer output signal is masked in the context switch, the + * GIC will keep track that of any interrupts raised while IRQS as + * disabled. As soon as IRQs are re-enabled, the virtual interrupt + * will be injected to Xen. + * + * If an IDLE vCPU was scheduled next then we should ignore the + * interrupt. + */ +if ( unlikely(is_idle_vcpu(current)) ) +return; + current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current->arch.virt_timer.irq); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-4.5 HVMOP ABI issues
>>> On 28.11.14 at 14:55, wrote: > The problem is with continuations which reuse the upper bits of the > input register, not with this HVMOP_op_mask specifically; the > HVMOP_op_mask simply adds to an existing problem. This is something > which needs considering urgently because, as you identify above, we have > already suffered an accidental ABI breakage with the mem-op widening. Since we can't retroactively fix the mem-op widening, I still don't see what's urgent here: As long as we don't change any of these masks, nothing bad is going to happen. Of course one thing to consider with this aspect in mind is whether to change the hvm-op or gnttab-op masks again _before_ 4.5 goes out, based on whether we feel they're wide enough for the (un)foreseeable future. > 32bit HVM guests reliably form a continuation on every single iteration > of a continuable hypercall (e.g. decrease reservation, which is the base > of my XSA-111 PoC), making it trivial to construct a migrateable HVM > domain which exposes the issue. Hmm, I can't offhand see why that would be, but what you write reads like you determined the reason? I ask because an unavoidable use of continuations is certainly nice for making sure those code paths get tested, but would be quite desirable to get eliminated at least for non-debug builds. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. Shortly after the upgrade we started to lose network connectivity to the DomU a few times a day that required a reboot to fix. We see nothing in the xen logs or xl dmesg, but when we looked at the dmesg output we saw the following output for the two incidents we investigated in detail: [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device [69332.031069] br-default: port 2(vif4.0) entered disabled state [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device [824365.531191] br-default: port 2(vif9.0) entered disabled state We have a very similar setup running on another machine with a 3.17.3 DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this machine. This is a test system rather than a production system so has a different workload and fewer CPUs. The piece of code that outputs the error is in drivers/net/xen-netback/netback.c. The DomU has 4000MB of RAM and 8 CPUs. Any ideas? Thanks, Anthony. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
On Fri, 2014-11-28 at 15:19 +, Anthony Wright wrote: > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to > 3.17.3 Is this a Debian kernel? In which case you might be seeing https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be fixed in the next upload of the kernel, test binaries with the fixes are referenced in the bug log. Even if not Debian then you'll probably want the same set of backports. Ian. > running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. > > Shortly after the upgrade we started to lose network connectivity to the > DomU a few times a day that required a reboot to fix. We see nothing in > the xen logs or xl dmesg, but when we looked at the dmesg output we saw > the following output for the two incidents we investigated in detail: > > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device > [69332.031069] br-default: port 2(vif4.0) entered disabled state > > > [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 > [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device > [824365.531191] br-default: port 2(vif9.0) entered disabled state > > We have a very similar setup running on another machine with a 3.17.3 > DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this > machine. This is a test system rather than a production system so has a > different workload and fewer CPUs. > > The piece of code that outputs the error is in > drivers/net/xen-netback/netback.c. > > The DomU has 4000MB of RAM and 8 CPUs. > > Any ideas? > > Thanks, > > Anthony. > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer
I forgot to add "for 4.5" in the commit title. On 28/11/14 15:17, Julien Grall wrote: > This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4. > It affects at least Xgene platform and ARMv8 models where Xen may > randomly crash. Thinking a bit more to this. I believe it's possible for a guest to crash the host if the next timer deadline is short enough and it execute a yield. Might be unlikely ... Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC V8 2/3] libxl domain snapshot API design
On Tue, 2014-11-25 at 02:08 -0700, Chun Yan Liu wrote: > Hi, Ian, > > According to previous discussion, snapshot delete and revert are > inclined to be done by high level application itself, won't supply a > libxl API. I thought you had explained a scenario where the toolstack needed to be at least aware of delete, specifically when you are deleting a snapshot from the middle of an active chain. Maybe that's not "snapshot delete API in libxl" though, but rather a notification API which the toolstack can use to tell libxl something is going on. > I'm wondering snapshot create need a new common API? > In fact its main work is save domain and take disk snapshot, xl can > do it too. I don't believe xl can take a disk snapshot of an active disk, it doesn't have the machinery to deal with that sort of thing, nor should it, this is exactly the sort of thing which libxl is provided to deal with. Also, libxl is driving the migration/memory snapshot, and I think the disk snapshot fundamentally needs to be involved in that process, not done separately by the toolstack. > I just write down an overview of the snapshot work (see below). > The problem is: do we need to export API? What kind of API? > In updating Bamvor's code, I think xl can do all the work, libvirt can > do the work too even without libxl's help. > > Of course, there are some thing if put in libxl, it will be easier to > use, like the domain snapshot info structure, gentype.py will > directly generate useful init/dispose/to_json/from_json functions. > Or the disk snapshot part can be extracted and placed in libxl or libxlu. > > Any suggestions about which part is better to be extracted as libxl > API or better not? > > Thanks, > Chunyan > > -- > libxl domain snapshot overview Just to be 100% clear: This is an overview of a domain snapshot architecture for a toolstack which uses libxl. A bunch of the things described here belong to the toolstack and not to libxl itself. I've tried to read with that in mind but a complete document should mention this and be careful to be clear about the distinction where it matters. > 0. Glossary [...] > * not support disk-only snapshot [1]. > > [1] > This is different from "libvirt". > To xl, it only concerns active domains, and even when domain > is paused, there is no data flush to disk operation. So, take > a disk-only snapshot and then resume, it is as if the guest > had crashed. For this reason, disk-only snapshot is meaningless > to xl. Should not support. > > To libvirt, it has active domains and inactive domains, for > the active domains, as "xl", it's meaning less to take disk-only > snapshot, but for inactive domains, disk-only snapshot is valid. > Should support. Do you mean to say here that disk-only snapshots are not supported in some toolstacks, or in no toolstack? Or are you just saying that libxl doesn't need to support them because they only apply to inactive domains? In either case it seems to me like your footnote is saying that you *do* want to support disk-only snapshots, at least in some stacks and/or configurations. I think you probably mean to say that disk-only snapshots of *active* domains are not supported. Whereas disk-only snapshots of inactive domains may or may not be depending on the toolstack. > > 2. Requirements > > General Requirements: > * ability to save/restore domain memory > * ability to create/delete/apply disk snapshot [2] > * ability to parse user config file > * ability to save/load/update domain snapshot metadata (or called > domain snapshot info, the metadata at least includes: > snapshot name, create time, description, memory state file, > disk snapshot info, parent (in snapshot chain), current (is > currently applied)) > > [2] Disk snapshot requirements: > * external tools: qemu-img, lvcreate, vhd-util, etc. > * For a basic goal, we support 'raw' and 'qcow2' backend types only. > Then only requires qemu: > use libxl qmp command (better) or "qemu-img" You should leave these implementation details for a later section, in this context they just invite quibbling about whether things belong in libxl etc and whether qmp commands are "better". The rest looks ok, but without the remainder of the design described in terms of the concepts given here it's hard to comment further. I'd suggest putting this all into one coherent document (not 3 as before) which starts by describing the terminology (section 0 in your mail which I'm replying to now), then gives an overview of the architecture (the rest of that mail), then describe which components (libxl, toolstack, etc) implement each bit of the architecture, then describe the libxl API which makes this possible (covered in previous mails I think). I think you have most of the words either here or from the other mails, they just need putting togethe
Re: [Xen-devel] Xen-4.5 HVMOP ABI issues
On 28/11/14 15:18, Jan Beulich wrote: On 28.11.14 at 14:55, wrote: >> The problem is with continuations which reuse the upper bits of the >> input register, not with this HVMOP_op_mask specifically; the >> HVMOP_op_mask simply adds to an existing problem. This is something >> which needs considering urgently because, as you identify above, we have >> already suffered an accidental ABI breakage with the mem-op widening. > Since we can't retroactively fix the mem-op widening, I still don't see > what's urgent here: As long as we don't change any of these masks, > nothing bad is going to happen. Of course one thing to consider with > this aspect in mind is whether to change the hvm-op or gnttab-op > masks again _before_ 4.5 goes out, based on whether we feel they're > wide enough for the (un)foreseeable future. By urgent, I mean exactly this, while we have the ability to tweak the masks. > >> 32bit HVM guests reliably form a continuation on every single iteration >> of a continuable hypercall (e.g. decrease reservation, which is the base >> of my XSA-111 PoC), making it trivial to construct a migrateable HVM >> domain which exposes the issue. > Hmm, I can't offhand see why that would be, but what you write > reads like you determined the reason? I have never identified why, but it is reliable. c/s 79de2d31f1 proves that some preemption condition is true for 32bit HVM guests by the time the hypercall handler is called. It is unreliable with 64bit HVM guests, suggesting that the compat translation layer may be the root cause of the issue. > I ask because an unavoidable > use of continuations is certainly nice for making sure those code paths > get tested, but would be quite desirable to get eliminated at least for > non-debug builds. While the preemption code is necessary to avoid spending ludicrous quantities of time synchronously in Xen, it is currently having the worst possible effect it could have on the system, by causing 32bit HVM guests to undergo a vmentry/vmexit and double compat translation for each iteration. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] get a handle for the tap device to shut it down
Xen does a shutdown of the emulated PCI network device in pci_unplug_nics. But this just disables the PCI device. The tap device for a given emulated card remains active because nothing closes the file descriptor. The cmdline for qemu contains something like "-device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:28:f1:ee -netdev type=tap,id=net0,ifname=vif1.0-emu,script=no,downscript=no". I wonder if the missing disable of the tap device is intentional, or just an oversight, or if its just to complicated to get from a "PCIDevice *" to the other end and call the ->cleanup function. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
On 28/11/2014 15:23, Ian Campbell wrote: > On Fri, 2014-11-28 at 15:19 +, Anthony Wright wrote: >> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to >> 3.17.3 > Is this a Debian kernel? In which case you might be seeing It's a stock kernel from kernel.org, we have a custom system with no relation to Debian. > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be > fixed in the next upload of the kernel, test binaries with the fixes are > referenced in the bug log. The error messages we're seeing are different from those reported, both the Dom0 and DomU continue to run correctly and the vif doesn't degrade slowly it fails the test in netback.c below which disables the interface: /* No crossing a page as the payload mustn't fragment. */ if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) { netdev_err(queue->vif->dev, "txreq.offset: %x, size: %u, end: %lu\n", txreq.offset, txreq.size, (txreq.offset&~PAGE_MASK) + txreq.size); xenvif_fatal_tx_err(queue->vif); break; } > Even if not Debian then you'll probably want the same set of backports. I'm happy to apply the backports if you think it's likely to fix the problem despite the different symptoms, but from what I can see it looks like a different problem. thanks, Anthony > Ian. >> running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. >> >> Shortly after the upgrade we started to lose network connectivity to the >> DomU a few times a day that required a reboot to fix. We see nothing in >> the xen logs or xl dmesg, but when we looked at the dmesg output we saw >> the following output for the two incidents we investigated in detail: >> >> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 >> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device >> [69332.031069] br-default: port 2(vif4.0) entered disabled state >> >> >> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 >> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device >> [824365.531191] br-default: port 2(vif9.0) entered disabled state >> >> We have a very similar setup running on another machine with a 3.17.3 >> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this >> machine. This is a test system rather than a production system so has a >> different workload and fewer CPUs. >> >> The piece of code that outputs the error is in >> drivers/net/xen-netback/netback.c. >> >> The DomU has 4000MB of RAM and 8 CPUs. >> >> Any ideas? >> >> Thanks, >> >> Anthony. >> >> ___ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: > create ^ > title it QMP connection problems prevent libxl from calling > libxl__device_pci_reset on domain shutdown > thanks > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > > Hi, > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > > libxl__device_pci_reset()" > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci > > passthrough. > > xl didn't show any problems on the commandline so i never drawed much > > attention > > to it, but /var/log/xen/xl-guest.log shows: > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > > Domain 19 has shut down, reason code 0 0x0 > > Action for shutdown reason code 0 is destroy > > Domain 19 needs to be cleaned up: destroying the domain > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection > > reset by peer > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to connect > > to QMP > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: > > Connection reset by peer > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already > > exited > > Done. Exiting now > > > > So it doesn't even get to calling "libxl_pci.c: libxl__device_pci_reset()". > > > > -- It's worth checking whether the device model exits too early, i.e., did it crash? What's in the DM log? Wei. > > Sander > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.5] reset PCI devices on force removal even when QEMU returns error
On do_pci_remove when QEMU returns error, we just bail out early without resetting the device. On domain shutdown we are racing with QEMU exiting and most often QEMU closes the QMP connection before executing the requested command. In these cases if force=1, it makes sense to go ahead with rest of the PCI device removal, that includes resetting the device and calling xc_deassign_device. Otherwise we risk not resetting the device properly on domain shutdown. Signed-off-by: Stefano Stabellini diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 316643c..0ac0b93 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1243,7 +1245,7 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid, rc = ERROR_INVAL; goto out_fail; } -if (rc) { +if (rc && !force) { rc = ERROR_FAIL; goto out_fail; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, 28 Nov 2014, Wei Liu wrote: > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: > > create ^ > > title it QMP connection problems prevent libxl from calling > > libxl__device_pci_reset on domain shutdown > > thanks > > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > > > Hi, > > > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > > > libxl__device_pci_reset()" > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci > > > passthrough. > > > xl didn't show any problems on the commandline so i never drawed much > > > attention > > > to it, but /var/log/xen/xl-guest.log shows: > > > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > > > Domain 19 has shut down, reason code 0 0x0 > > > Action for shutdown reason code 0 is destroy > > > Domain 19 needs to be cleaned up: destroying the domain > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: Connection > > > reset by peer > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to > > > connect to QMP > > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: > > > Connection reset by peer > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model already > > > exited > > > Done. Exiting now > > > > > > So it doesn't even get to calling "libxl_pci.c: > > > libxl__device_pci_reset()". > > > > > > -- > > It's worth checking whether the device model exits too early, i.e., did > it crash? What's in the DM log? Firstly I was thinking that if force=1 it makes sense to continue even when QEMU returns error, see: http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-linus bisection] complete test-amd64-i386-xl-credit2
branch xen-unstable xen branch xen-unstable job test-amd64-i386-xl-credit2 test guest-saverestore Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Bug introduced: c6c9161d064d30e78904f3affe5184487493e0fc Bug not present: 8b2ed21e846c63d8f1bdee0d8df0645721a604a1 commit c6c9161d064d30e78904f3affe5184487493e0fc Merge: 8b2ed21 b5e212a Author: Linus Torvalds Date: Fri Nov 21 15:46:17 2014 -0800 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Misc fixes: - gold linker build fix - noxsave command line parsing fix - bugfix for NX setup - microcode resume path bug fix - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious lockup thread" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 x86, kaslr: Handle Gold linker for finding bss/brk x86, mm: Set NX across entire PMD at boot x86, microcode: Update BSPs microcode on resume x86: Require exact match for 'noxsave' command line option commit b5e212a3051b65e426a513901d9c7001681c7215 Author: Andy Lutomirski Date: Wed Nov 19 13:56:19 2014 -0800 x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SINGLESTEP), not (1<<19). This code is involved in Dave's trinity lockup, but I don't see why it would cause any of the problems he's seeing, except inadvertently by causing a different path through entry_64.S's syscall handling. Signed-off-by: Andy Lutomirski Cc: Don Zickus Cc: Peter Zijlstra Cc: Dave Jones Cc: Linus Torvalds Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.l...@amacapital.net Signed-off-by: Thomas Gleixner commit 70b61e362187b5fccac206506d402f3424e3e749 Author: Kees Cook Date: Mon Nov 17 16:16:04 2014 -0800 x86, kaslr: Handle Gold linker for finding bss/brk When building with the Gold linker, the .bss and .brk areas of vmlinux are shown as consecutive instead of having the same file offset. Allow for either state, as long as things add up correctly. Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd") Reported-by: Markus Trippelsdorf Signed-off-by: Kees Cook Cc: Junjie Mao Link: http://lkml.kernel.org/r/20141118001604.ga25...@www.outflux.net Cc: sta...@vger.kernel.org Signed-off-by: Thomas Gleixner commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 Author: Kees Cook Date: Fri Nov 14 11:47:37 2014 -0800 x86, mm: Set NX across entire PMD at boot When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW GLB x pte 0x82e0-0xc000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82e012M RW PSE GLB NX pmd 0x82e0-0xc000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook Cc: Andy Lutomirski Cc: Toshi Kani Cc: Yasuaki Ishimatsu Cc: David Vrabel Cc: Wang Nan Cc: Yinghai Lu Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/20141114194737.ga3...@www.outflux.net Signed-off-by: Thomas Gleixner commit fb86b97300d930b57471068720c52bfa8622eab7 Autho
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote: > On Fri, 28 Nov 2014, Wei Liu wrote: > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: > > > create ^ > > > title it QMP connection problems prevent libxl from calling > > > libxl__device_pci_reset on domain shutdown > > > thanks > > > > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > > > > Hi, > > > > > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > > > > libxl__device_pci_reset()" > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with pci > > > > passthrough. > > > > xl didn't show any problems on the commandline so i never drawed much > > > > attention > > > > to it, but /var/log/xen/xl-guest.log shows: > > > > > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > > > > Domain 19 has shut down, reason code 0 0x0 > > > > Action for shutdown reason code 0 is destroy > > > > Domain 19 needs to be cleaned up: destroying the domain > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: > > > > Connection reset by peer > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to > > > > connect to QMP > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: > > > > Connection reset by peer > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model > > > > already exited > > > > Done. Exiting now > > > > > > > > So it doesn't even get to calling "libxl_pci.c: > > > > libxl__device_pci_reset()". > > > > > > > > -- > > > > It's worth checking whether the device model exits too early, i.e., did > > it crash? What's in the DM log? > > Firstly I was thinking that if force=1 it makes sense to continue even > when QEMU returns error, see: > > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com In normal case DM is not destroyed until PCI device is removed. So I think the real issue is DM crashed. libxl.c: 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0) 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for domid %d", domid); 1573 rc = xc_domain_pause(ctx->xch, domid); 1574 if (rc < 0) { 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause failed for %d", domid); 1576 } 1577 if (dm_present) { 1578 if (libxl__destroy_device_model(gc, domid) < 0) 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_model failed for %d", domid); 1580 1581 libxl__qmp_cleanup(gc, domid); 1582 } The patch you posted covers up the real issue. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, 28 Nov 2014, Wei Liu wrote: > On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote: > > On Fri, 28 Nov 2014, Wei Liu wrote: > > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: > > > > create ^ > > > > title it QMP connection problems prevent libxl from calling > > > > libxl__device_pci_reset on domain shutdown > > > > thanks > > > > > > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > > > > > Hi, > > > > > > > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > > > > > libxl__device_pci_reset()" > > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with > > > > > pci passthrough. > > > > > xl didn't show any problems on the commandline so i never drawed much > > > > > attention > > > > > to it, but /var/log/xen/xl-guest.log shows: > > > > > > > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > > > > > Domain 19 has shut down, reason code 0 0x0 > > > > > Action for shutdown reason code 0 is destroy > > > > > Domain 19 needs to be cleaned up: destroying the domain > > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: > > > > > Connection reset by peer > > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to > > > > > connect to QMP > > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: libxl__qmp_pci_del: > > > > > Connection reset by peer > > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model > > > > > already exited > > > > > Done. Exiting now > > > > > > > > > > So it doesn't even get to calling "libxl_pci.c: > > > > > libxl__device_pci_reset()". > > > > > > > > > > -- > > > > > > It's worth checking whether the device model exits too early, i.e., did > > > it crash? What's in the DM log? > > > > Firstly I was thinking that if force=1 it makes sense to continue even > > when QEMU returns error, see: > > > > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com > > In normal case DM is not destroyed until PCI device is removed. So I > think the real issue is DM crashed. I don't think it crashed: QEMU detects that domain shutdown has been called and exits. See: static void main_loop(void) { bool nonblocking; int last_io = 0; #ifdef CONFIG_PROFILER int64_t ti; #endif do { [...] } while (!main_loop_should_exit()); } main_loop_should_exit returns true when qemu_shutdown_requested(). > libxl.c: > 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0) > 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for domid > %d", domid); > 1573 rc = xc_domain_pause(ctx->xch, domid); > 1574 if (rc < 0) { > 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause > failed for %d", domid); > 1576 } > 1577 if (dm_present) { > 1578 if (libxl__destroy_device_model(gc, domid) < 0) > 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, > "libxl__destroy_device_model failed for %d", domid); > 1580 > 1581 libxl__qmp_cleanup(gc, domid); > 1582 } > > The patch you posted covers up the real issue. That is true, but I think the patch is correct regardless of the issue. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
Friday, November 28, 2014, 6:09:52 PM, you wrote: > On Fri, 28 Nov 2014, Wei Liu wrote: >> On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote: >> > On Fri, 28 Nov 2014, Wei Liu wrote: >> > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: >> > > > create ^ >> > > > title it QMP connection problems prevent libxl from calling >> > > > libxl__device_pci_reset on domain shutdown >> > > > thanks >> > > > >> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: >> > > > > Hi, >> > > > > >> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: >> > > > > libxl__device_pci_reset()" >> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) with >> > > > > pci passthrough. >> > > > > xl didn't show any problems on the commandline so i never drawed >> > > > > much attention >> > > > > to it, but /var/log/xen/xl-guest.log shows: >> > > > > >> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] >> > > > > Domain 19 has shut down, reason code 0 0x0 >> > > > > Action for shutdown reason code 0 is destroy >> > > > > Domain 19 needs to be cleaned up: destroying the domain >> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: >> > > > > Connection reset by peer >> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to >> > > > > connect to QMP >> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: >> > > > > libxl__qmp_pci_del: Connection reset by peer >> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model >> > > > > already exited >> > > > > Done. Exiting now >> > > > > >> > > > > So it doesn't even get to calling "libxl_pci.c: >> > > > > libxl__device_pci_reset()". >> > > > > >> > > > > -- >> > > >> > > It's worth checking whether the device model exits too early, i.e., did >> > > it crash? What's in the DM log? >> > >> > Firstly I was thinking that if force=1 it makes sense to continue even >> > when QEMU returns error, see: >> > >> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com >> >> In normal case DM is not destroyed until PCI device is removed. So I >> think the real issue is DM crashed. > I don't think it crashed: QEMU detects that domain shutdown has been > called and exits. > See: > static void main_loop(void) > { > bool nonblocking; > int last_io = 0; > #ifdef CONFIG_PROFILER > int64_t ti; > #endif > do { > [...] > } while (!main_loop_should_exit()); > } > main_loop_should_exit returns true when qemu_shutdown_requested(). >> libxl.c: >> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0) >> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for >> domid %d", domid); >> 1573 rc = xc_domain_pause(ctx->xch, domid); >> 1574 if (rc < 0) { >> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause >> failed for %d", domid); >> 1576 } >> 1577 if (dm_present) { >> 1578 if (libxl__destroy_device_model(gc, domid) < 0) >> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, >> "libxl__destroy_device_model failed for %d", domid); >> 1580 >> 1581 libxl__qmp_cleanup(gc, domid); >> 1582 } >> >> The patch you posted covers up the real issue. > That is true, but I think the patch is correct regardless of the issue. Hmm how about qemu's "-no-shutdown" option ? >From the manpage: -no-shutdown Don't exit QEMU on guest shutdown, but instead only stop the emulation. This allows for instance switching to monitor to commit changes to the disk image. with: device_model_args_hvm = [ '-no-shutdown' ] I get: /var/log/xen/xl-tv.log: Waiting for domain tv (domid 18) to die [pid 18587] Domain 18 has shut down, reason code 0 0x0 Action for shutdown reason code 0 is destroy Domain 18 needs to be cleaned up: destroying the domain libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission irq=40: Invalid argument libxl: error: libxl_pci.c:1012:libxl__device_pci_reset: ?!?!? Me here Done. Exiting now /var/log/xen/qemu-dm-tv.log: char device redirected to /dev/pts/21 (label serial0) VNC server running on `127.0.0.1:5901' xen be: vkbd-0: initialise() failed xen be: vkbd-0: initialise() failed xen be: vkbd-0: initialise() failed Issued domain 18 poweroff qemu: terminating on signal 1 from pid 18587 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline test] 31886: regressions - FAIL
flight 31886 qemu-mainline real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/31886/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-debianhvm-amd64 5 xen-boot fail REGR. vs. 31855 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31855 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 9 guest-start fail never pass test-amd64-i386-libvirt 9 guest-start fail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-libvirt 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass version targeted for testing: qemuu490309fcfbed9fa1ed357541f609975016a34628 baseline version: qemuuca6028185d19d3f2bd331c15175c3ef5afc30c77 People who touched revisions under test: Christian Borntraeger Don Slutz Eric Blake Gerd Hoffmann Gonglei Markus Armbruster Paolo Bonzini Peter Maydell jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64fail test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64fail test-amd64-i386-xl-credit2 pass test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt
Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, 28 Nov 2014, Sander Eikelenboom wrote: > Friday, November 28, 2014, 6:09:52 PM, you wrote: > > > On Fri, 28 Nov 2014, Wei Liu wrote: > >> On Fri, Nov 28, 2014 at 04:54:19PM +, Stefano Stabellini wrote: > >> > On Fri, 28 Nov 2014, Wei Liu wrote: > >> > > On Fri, Nov 28, 2014 at 03:08:51PM +, Stefano Stabellini wrote: > >> > > > create ^ > >> > > > title it QMP connection problems prevent libxl from calling > >> > > > libxl__device_pci_reset on domain shutdown > >> > > > thanks > >> > > > > >> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > >> > > > > Hi, > >> > > > > > >> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > >> > > > > libxl__device_pci_reset()" > >> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) > >> > > > > with pci passthrough. > >> > > > > xl didn't show any problems on the commandline so i never drawed > >> > > > > much attention > >> > > > > to it, but /var/log/xen/xl-guest.log shows: > >> > > > > > >> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > >> > > > > Domain 19 has shut down, reason code 0 0x0 > >> > > > > Action for shutdown reason code 0 is destroy > >> > > > > Domain 19 needs to be cleaned up: destroying the domain > >> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: > >> > > > > Connection reset by peer > >> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to > >> > > > > connect to QMP > >> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: > >> > > > > libxl__qmp_pci_del: Connection reset by peer > >> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model > >> > > > > already exited > >> > > > > Done. Exiting now > >> > > > > > >> > > > > So it doesn't even get to calling "libxl_pci.c: > >> > > > > libxl__device_pci_reset()". > >> > > > > > >> > > > > -- > >> > > > >> > > It's worth checking whether the device model exits too early, i.e., did > >> > > it crash? What's in the DM log? > >> > > >> > Firstly I was thinking that if force=1 it makes sense to continue even > >> > when QEMU returns error, see: > >> > > >> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com > >> > >> In normal case DM is not destroyed until PCI device is removed. So I > >> think the real issue is DM crashed. > > > I don't think it crashed: QEMU detects that domain shutdown has been > > called and exits. > > See: > > > static void main_loop(void) > > { > > bool nonblocking; > > int last_io = 0; > > #ifdef CONFIG_PROFILER > > int64_t ti; > > #endif > > do { > > > [...] > > > } while (!main_loop_should_exit()); > > } > > > > main_loop_should_exit returns true when qemu_shutdown_requested(). > > > >> libxl.c: > >> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0) > >> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for > >> domid %d", domid); > >> 1573 rc = xc_domain_pause(ctx->xch, domid); > >> 1574 if (rc < 0) { > >> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, > >> "xc_domain_pause failed for %d", domid); > >> 1576 } > >> 1577 if (dm_present) { > >> 1578 if (libxl__destroy_device_model(gc, domid) < 0) > >> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, > >> "libxl__destroy_device_model failed for %d", domid); > >> 1580 > >> 1581 libxl__qmp_cleanup(gc, domid); > >> 1582 } > >> > >> The patch you posted covers up the real issue. > > > That is true, but I think the patch is correct regardless of the issue. > > Hmm how about qemu's "-no-shutdown" option ? > >From the manpage: > -no-shutdown > Don't exit QEMU on guest shutdown, but instead only stop the emulation. > This allows for instance switching to monitor to commit changes to the disk > image. > > with: > device_model_args_hvm = [ '-no-shutdown' ] > > I get: > > /var/log/xen/xl-tv.log: > > Waiting for domain tv (domid 18) to die [pid 18587] > Domain 18 has shut down, reason code 0 0x0 > Action for shutdown reason code 0 is destroy > Domain 18 needs to be cleaned up: destroying the domain > libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission > irq=40: Invalid argument Thanks for testing! The suggestion might be a good idea, but I am not sure whether it is suitable given the stage of the release cycle. I would wait until 4.6 before making that change. Regarding the xc_domain_irq_permission error, it is just a matter of calling xc_physdev_unmap_pirq after xc_domain_irq_permission: diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 316643c..904d255 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1288,14 +1290,14 @@ skip1: goto out; } if ((fscanf(f, "%u", &irq) == 1) && irq) { -rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq); -if (rc < 0) { -LIBXL__LOG_ERRNO
Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT
On Thu, Nov 27, 2014 at 06:50:31PM +, Andrew Cooper wrote: > On 27/11/14 18:36, Luis R. Rodriguez wrote: > > On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote: > >> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote: > >>> From: "Luis R. Rodriguez" > >>> > >>> Some folks had reported that some xen hypercalls take a long time > >>> to complete when issued from the userspace private ioctl mechanism, > >>> this can happen for instance with some hypercalls that have many > >>> sub-operations, this can happen for instance on hypercalls that use > >>> multi-call feature whereby Xen lets one hypercall batch out a series > >>> of other hypercalls on the hypervisor. At times such hypercalls can > >>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default > >>> 120 seconds), this a non-issue issue on preemptible kernels though as > >>> the kernel may deschedule such long running tasks. Xen for instance > >>> supports multicalls to be preempted as well, this is what Xen calls > >>> continuation (see xen commit 42217cbc5b which introduced this [0]). > >>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary > >>> or no preemption -- a long running hypercall will not be descheduled > >>> until the hypercall is complete and the ioctl returns to user space. > >>> > >>> To help with this David had originally implemented support for use > >>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This > >>> solution never went upstream though and upon review to help refactor > >>> this I've concluded that usage of preempt_schedule_irq() would be > >>> a bit abussive of existing APIs -- for a few reasons: > >>> > >>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels > >>> > >>> 1) we want try to consider solutions that might work for other > >>> hypervisors for this same problem, and identify it its an issue > >>> even present on other hypervisors or if this is a self > >>> inflicted architectural issue caused by use of multicalls > >>> > >>> 2) there is no documentation or profiling of the exact hypercalls > >>> that were causing these issues, nor do we have any context > >>> to help evaluate this any further > >>> > >>> I at least checked with kvm folks and it seems hypercall preemption > >>> is not needed there. We can survey other hypervisors... > >>> > >>> If 'something like preemption' is needed then CONFIG_PREEMPT > >>> should just be enabled and encouraged, it seems we want to > >>> encourage CONFIG_PREEMPT on xen, specially when multicalls are > >>> used. In the meantime this tries to address a solution to help > >>> xen on non CONFIG_PREEMPT kernels. > >>> > >>> One option tested and evaluated was to put private hypercalls in > >>> process context, however this would introduce complexities such > >>> originating hypercalls from different contexts. Current xen > >>> hypercall callback handlers would need to be changed per architecture, > >>> for instance, we'd also incur the cost of switching states from > >>> user / kernel (this cost is also present if preempt_schedule_irq() > >>> is used). There may be other issues which could be introduced with > >>> this strategy as well. The simplest *shared* alternative is instead > >>> to just explicitly schedule() at the end of a private hypercall on non > >>> preempt kernels. This forces our private hypercall call mechanism > >>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of > >>> more context switch but keeps the private hypercall context intact. > >>> > >>> [0] > >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 > >>> [1] > >>> http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch > >>> > >>> Cc: Davidlohr Bueso > >>> Cc: Joerg Roedel > >>> Cc: Borislav Petkov > >>> Cc: Konrad Rzeszutek Wilk > >>> Cc: Jan Beulich > >>> Cc: Juergen Gross > >>> Cc: Olaf Hering > >>> Cc: David Vrabel > >>> Signed-off-by: Luis R. Rodriguez > >>> --- > >>> drivers/xen/privcmd.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c > >>> index 569a13b..e29edba 100644 > >>> --- a/drivers/xen/privcmd.c > >>> +++ b/drivers/xen/privcmd.c > >>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata) > >>> hypercall.arg[0], hypercall.arg[1], > >>> hypercall.arg[2], hypercall.arg[3], > >>> hypercall.arg[4]); > >>> +#ifndef CONFIG_PREEMPT > >>> + schedule(); > >>> +#endif > >>> > >>> return ret; > >>> } > >>> > >> Sorry, I don't think this will solve anything. You're calling schedule() > >> right after the long running hypercall just nanoseconds before returning > >> to the user. > > Yeah, well that is what [1] tried as well only it tried usi