Re: [Xen-devel] [PATCH v4 for 4.5] x86/VPMU: Clear last_vcpu when destroying VPMU
> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] > Sent: Tuesday, January 06, 2015 12:33 AM > > On Thu, Dec 18, 2014 at 01:06:40PM -0500, Boris Ostrovsky wrote: > > We need to make sure that last_vcpu is not pointing to VCPU whose > > VPMU is being destroyed. Otherwise we may try to dereference it in > > the future, when VCPU is gone. > > > > We have to do this via IPI since otherwise there is a (somewheat > > theoretical) chance that between test and subsequent clearing > > of last_vcpu the remote processor (i.e. vpmu->last_pcpu) might do > > both vpmu_load() and then vpmu_save() for another VCPU. The former > > will clear last_vcpu and the latter will set it to something else. > > > > Performing this operation via IPI will guarantee that nothing can > > happen on the remote processor between testing and clearing of > > last_vcpu. > > > > We should also check for VPMU_CONTEXT_ALLOCATED in vpmu_destroy() to > > avoid unnecessary percpu tests and arch-specific destroy ops. Thus > > checks in AMD and Intel routines are no longer needed. > > > > Signed-off-by: Boris Ostrovsky > > --- > > xen/arch/x86/hvm/svm/vpmu.c |3 --- > > xen/arch/x86/hvm/vmx/vpmu_core2.c |2 -- > > xen/arch/x86/hvm/vpmu.c | 20 > > 3 files changed, 20 insertions(+), 5 deletions(-) > > CC-ing the rest of the maintainers (Intel ones, since Boris is > on the AMD side). > > I am OK with this patch going in Xen 4.5 as for one thing to actually > use vpmu you have to pass 'vpmu=1' on the Xen command line. > > Aka, Release-Acked-by: Konrad Rzeszutek Wilk Acked-by: Kevin Tian > > > > > Changes in v4: > > * Back to v2's IPI implementation > > > > Changes in v3: > > * Use cmpxchg instead of IPI > > * Use correct routine names in commit message > > * Remove duplicate test for VPMU_CONTEXT_ALLOCATED in arch-specific > >destroy ops > > > > Changes in v2: > > * Test last_vcpu locally before IPI > > * Don't handle local pcpu as a special case --- on_selected_cpus > >will take care of that > > * Dont't cast variables unnecessarily > > > > > > diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c > > index 8e07a98..4c448bb 100644 > > --- a/xen/arch/x86/hvm/svm/vpmu.c > > +++ b/xen/arch/x86/hvm/svm/vpmu.c > > @@ -403,9 +403,6 @@ static void amd_vpmu_destroy(struct vcpu *v) > > { > > struct vpmu_struct *vpmu = vcpu_vpmu(v); > > > > -if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > > -return; > > - > > if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) > > amd_vpmu_unset_msr_bitmap(v); > > > > diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c > b/xen/arch/x86/hvm/vmx/vpmu_core2.c > > index 68b6272..590c2a9 100644 > > --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c > > +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c > > @@ -818,8 +818,6 @@ static void core2_vpmu_destroy(struct vcpu *v) > > struct vpmu_struct *vpmu = vcpu_vpmu(v); > > struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context; > > > > -if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > > -return; > > xfree(core2_vpmu_cxt->pmu_enable); > > xfree(vpmu->context); > > if ( cpu_has_vmx_msr_bitmap ) > > diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c > > index 1df74c2..37f0d9f 100644 > > --- a/xen/arch/x86/hvm/vpmu.c > > +++ b/xen/arch/x86/hvm/vpmu.c > > @@ -247,10 +247,30 @@ void vpmu_initialise(struct vcpu *v) > > } > > } > > > > +static void vpmu_clear_last(void *arg) > > +{ > > +if ( this_cpu(last_vcpu) == arg ) > > +this_cpu(last_vcpu) = NULL; > > +} > > + > > void vpmu_destroy(struct vcpu *v) > > { > > struct vpmu_struct *vpmu = vcpu_vpmu(v); > > > > +if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > > +return; > > + > > +/* > > + * Need to clear last_vcpu in case it points to v. > > + * We can check here non-atomically whether it is 'v' since > > + * last_vcpu can never become 'v' again at this point. > > + * We will test it again in vpmu_clear_last() with interrupts > > + * disabled to make sure we don't clear someone else. > > + */ > > +if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v ) > > +on_selected_cpus(cpumask_of(vpmu->last_pcpu), > > + vpmu_clear_last, v, 1); > > + > > if ( vpmu->arch_vpmu_ops && > vpmu->arch_vpmu_ops->arch_vpmu_destroy ) > > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > > } > > -- > > 1.7.1 > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] x86: expose CMT L3 event mask to user space
>>> On 23.12.14 at 09:54, wrote: > --- a/xen/arch/x86/sysctl.c > +++ b/xen/arch/x86/sysctl.c > @@ -157,6 +157,11 @@ long arch_do_sysctl( > sysctl->u.psr_cmt_op.u.data = (ret ? 0 : info.size); > break; > } > +case XEN_SYSCTL_PSR_CMT_get_l3_event_mask: > +{ > +sysctl->u.psr_cmt_op.u.data = psr_cmt->l3.features; > +break; > +} Stray figure braces. Other than that Acked-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
On 07/01/2015 08:18, Andy Lutomirski wrote: >>> >> Thus far, I've been told unambiguously that a guest can't observe pvti >>> >> while it's being written, and I think you're now telling me that this >>> >> isn't true and that a guest *can* observe pvti while it's being >>> >> written while the low bit of the version field is not set. If so, >>> >> this is rather strongly incompatible with the spec in the KVM docs. >> > >> > Where am I saying that? > I thought the conclusion from what you and Marcelo pointed out about > the code was that, once the first vCPU updated its pvti, it could > start running guest code while the other vCPUs are still updating > pvti, so its guest code can observe the other vCPUs mid-update. Ah, in that sense you're right. However, each VCPU cannot observe _its own_ pvti entry while it's being written (no matter what's in the low bit of the version field). Paolo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 06.01.15 at 12:55, wrote: > On 06/01/15 02:18, Boris Ostrovsky wrote: > >> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h >> index 5f295f3..a5eb81c 100644 >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -56,6 +56,8 @@ struct pci_dev { >> >> u8 phantom_stride; >> >> +int node; /* NUMA node */ >> + >> enum pdev_type { >> DEV_TYPE_PCI_UNKNOWN, >> DEV_TYPE_PCIe_ENDPOINT, >> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *, >> int pci_release_devices(struct domain *d); >> int pci_add_segment(u16 seg); >> const unsigned long *pci_get_ro_map(u16 seg); >> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *); >> +int pci_add_device(u16 seg, u8 bus, u8 devfn, >> + const struct pci_dev_info *, int); > > Please use parameter names in definitions. For the added parameter - yes. For the pre-existing pointer one I don't see a strong need to do so (and there was no name there before). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote: > .. and use this new interface to display it along with CPU topology > and NUMA information when 'xl info -n' command is issued > Also, can we see how an `xl info -n' looks, on an IONUMA system? > @@ -195,6 +195,24 @@ int xc_cputopoinfo(xc_interface *xch, > return 0; > } > > +int xc_pcitopoinfo(xc_interface *xch, > + xc_pcitopoinfo_t *put_info) > +{ > +int ret; > +DECLARE_SYSCTL; > + > +sysctl.cmd = XEN_SYSCTL_pcitopoinfo; > + > +memcpy(&sysctl.u.pcitopoinfo, put_info, sizeof(*put_info)); > + > +if ( (ret = do_sysctl(xch, &sysctl)) != 0 ) > +return ret; > + > +memcpy(put_info, &sysctl.u.pcitopoinfo, sizeof(*put_info)); > + > +return 0; > +} > @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx > *ctx, int *nb_cpu_out) > return ret; > } > > +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs) > +{ > +GC_INIT(ctx); > +xc_pcitopoinfo_t tinfo; > +DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo); > +libxl_pcitopology *ret = NULL; > +int i, rc; > + I see from where this comes from. However, at least from new functions, I think we should avoid using things like DECLARE_HYPERCALL_BUFFER etc., in libxl. They belong in libxc, IMO. This is basically what Andrew was doing here (although that was on xc_{topology,numa}info: http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of the work, with libxl_get_pci_topology being just a wrapper to it. In fact, if you grep/cscope for things like DECLARE_HYPERCALL_BUFFER in tools/libxl, the *only* results are these: libxl.c libxl_get_cpu_topology 5076 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap); libxl.c libxl_get_cpu_topology 5077 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap); libxl.c libxl_get_cpu_topology 5078 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap); libxl.c libxl_get_numainfo 5142 DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); libxl.c libxl_get_numainfo 5143 DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); libxl.c libxl_get_numainfo 5144 DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); And I think we should work toward removing these too, rather than adding new ones! :-) Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 06.01.15 at 03:18, wrote: > @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, > XEN_GUEST_HANDLE_PARAM(void) arg) > } > else > pdev_info.is_virtfn = 0; > -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); > + > +if ( add.flags & XEN_PCI_DEV_PXM ) > +{ > +uint32_t pxm; > +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) > / unsigned int or size_t. > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -56,6 +56,8 @@ struct pci_dev { > > u8 phantom_stride; > > +int node; /* NUMA node */ I thought I asked about this on v1 already: Does this really need to be an int, when commonly node numbers are stored in u8/unsigned char? Shrinking the field size would prevent the structure size from growing... Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] README, xen/Makefile: Update to Xen 4.5.0
On Tue, 2015-01-06 at 19:46 +0100, Sander Eikelenboom wrote: > > diff --git a/README b/README > > index 412607a..641bb23 100644 > > --- a/README > > +++ b/README > > @@ -1,9 +1,9 @@ > > # > > -__ ___ ___ _ > > > > -\ \/ /___ _ __ | || | | ___| _ _ _ __ ___| |_ __ _| |__ | | > > ___ > > - \ // _ \ '_ \ | || |_ |___ \ _| | | | '_ \/ __| __/ _` | '_ \| |/ _ > > \ > > - / \ __/ | | | |__ _| ___) |_| |_| | | | \__ \ || (_| | |_) | | > > __/ > > -/_/\_\___|_| |_||_|(_)/ \__,_|_| > > |_|___/\__\__,_|_.__/|_|\___| > > +__ ___ _ ___ > > +\ \/ /___ _ __ | || | | ___| / _ \ > > + \ // _ \ '_ \ | || |_ |___ \| | | | > > + / \ __/ | | | |__ _| ___) | |_| | > > +/_/\_\___|_| |_||_|(_)(_)___/ > > > > # > > > > @@ -19,14 +19,33 @@ is freely-distributable Open Source software, released > > under the GNU > > GPL. Since its initial public release, Xen has grown a large > > development community, spearheaded by xen.org (http://www.xen.org). > > Shouldn't this be "Xen-project" and "http://www.xenproject.org";, since the > transition > to the Linux foundation ? For the URLs, I guess, but it's not urgent for 4.5 IMHO, the old URLs aren't going anywhere. For the prose, NAK, the hypervisor is called "Xen". Lets not make the technical documentation use the long-winded and stilted "Xen Project Hypervisor" or whatever when a simple "Xen" works fine. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
On Mon, Jan 05, Vitaly Kuznetsov wrote: > Wei Liu writes: > > > Olaf mentioned his concern about handling ballooned pages in > > <20141211153029.ga1...@aepfle.de>. Is that point moot now? > > Well, the limitation is real and some guest-side handling will be > required in case we want to support kexec with ballooning. But as David > validly mentioned "It's the responsibility of the guest to ensure it > either doesn't kexec when it is ballooned or that the kexec kernel can > handle this". Not sure if we can (and need to) do anything hypevisor- or > toolstack-side. One approach would be to mark all pages as some sort of populate-on-demand first. Then copy the existing assigned pages from domA to domB and update the page type. The remaining pages are likely ballooned. Once the guest tries to access them this should give the hypervisor and/or toolstack a chance to assign a real RAM page to them. I mean, if a host-assisted approach for kexec is implemented then this approach must also cover ballooning. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
>>> On 06.01.15 at 14:41, wrote: > On 06/01/15 02:18, Boris Ostrovsky wrote: >> Instead of copying data for each field in xen_sysctl_topologyinfo separately >> put cpu/socket/node into a single structure and do a single copy for each >> processor. >> >> There is also no need to copy whole op to user at the end, max_cpu_index is >> sufficient >> >> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the >> fact >> that these are used for CPU topology. Subsequent patch will add support for >> PCI topology sysctl. >> >> Signed-off-by: Boris Ostrovsky > > If we are going to change the hypercall, then can we see about making it > a stable interface (i.e. not a sysctl/domctl)? There are non-toolstack > components which might want/need access to this information. (i.e. I am > still looking for a reasonable way to get this information from Xen in > hwloc) In which case leaving the sysctl alone and just adding a new non-sysctl interface should be considered. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
>>> On 06.01.15 at 03:18, wrote: > --- a/xen/common/sysctl.c > +++ b/xen/common/sysctl.c > @@ -365,6 +365,66 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) > u_sysctl) > } > break; > #ifdef HAS_PCI > +case XEN_SYSCTL_pcitopoinfo: > +{ > +xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo; > + > +if ( guest_handle_is_null(ti->pcitopo) || > + (ti->first_dev >= ti->num_devs) ) > +{ > +ret = -EINVAL; > +break; > +} > + > +for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ ) > +{ > +xen_sysctl_pcitopo_t pcitopo; > +struct pci_dev *pdev; > + > +if ( copy_from_guest_offset(&pcitopo, ti->pcitopo, > +ti->first_dev, 1) ) > +{ > +ret = -EFAULT; > +break; > +} > + > +spin_lock(&pcidevs_lock); > +pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus, > +pcitopo.pcidev.devfn); > +if ( !pdev || (pdev->node == NUMA_NO_NODE) ) > +pcitopo.node = INVALID_TOPOLOGY_ID; > +else > +pcitopo.node = pdev->node; Are hypervisor-internal node numbers really meaningful to the caller? > +spin_unlock(&pcidevs_lock); > + > +if ( copy_to_guest_offset(ti->pcitopo, ti->first_dev, __copy_ty_guest_offset() > + &pcitopo, 1) ) > +{ > +ret = -EFAULT; > +break; > +} > + > +if ( hypercall_preempt_check() ) > +break; You didn't increment ->first_dev yet, i.e. you'd start with the same index again when continuing later, and in the end you may not make any forward progress. > @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op > xen_sysctl_lockprof_op_t; > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t); > > /* XEN_SYSCTL_cputopoinfo */ > -#define INVALID_TOPOLOGY_ID (~0U) > +#define INVALID_TOPOLOGY_ID (~0U) /* Also used by pcitopo */ Better extend the preceding comment. > @@ -492,6 +493,36 @@ struct xen_sysctl_cputopoinfo { > typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t; > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t); > > +/* XEN_SYSCTL_pcitopoinfo */ > +struct xen_sysctl_pcitopo { > +struct physdev_pci_device pcidev; > +uint32_t node; > +}; > +typedef struct xen_sysctl_pcitopo xen_sysctl_pcitopo_t; > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopo_t); > + > +struct xen_sysctl_pcitopoinfo { > +/* IN: Size of pcitopo array */ > +uint32_t num_devs; > + > +/* > + * IN/OUT: First element of pcitopo array that needs to be processed by > + * hypervisor. > + * This is used primarily by hypercall continuations and callers will > + * typically set it to zero > + */ > +uint32_t first_dev; > + > +/* > + * If not NULL, filled with node identifier for each pcidev The "If not NULL" would be meaningful only if NULL had a special meaning. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/7] tools/hotplug: remove SELinux options from var-lib-xenstored.mount
On Tue, Jan 06, Ian Campbell wrote: > On Fri, 2014-12-19 at 12:25 +0100, Olaf Hering wrote: ... > Acked-by: Ian Campbell > > (on commit s/Appearently/Apparently/; s/non-existant/non-existent/ in > the commit log) I made typos also in other commit messages. Should I resend the entire series, or will this be done during commit? Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/7] tools/hotplug: remove SELinux options from var-lib-xenstored.mount
On Wed, 2015-01-07 at 10:23 +0100, Olaf Hering wrote: > On Tue, Jan 06, Ian Campbell wrote: > > > On Fri, 2014-12-19 at 12:25 +0100, Olaf Hering wrote: > > ... > > > Acked-by: Ian Campbell > > > > (on commit s/Appearently/Apparently/; s/non-existant/non-existent/ in > > the commit log) > > I made typos also in other commit messages. Should I resend the entire > series, or will this be done during commit? Looks like Konrad already committed, I don't know if he fixed the typos (I suppose it doesn't matter now either way). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
>>> On 06.01.15 at 17:15, wrote: > Hello, > > Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: > "Each stable branch has a maintainer who is nominated/volunteers according > to the Maintainer Election > process described in the project governance document > [http://www.xenproject.org/governance.html]. > This will resulting in the MAINTAINERS file in the relevant branch being > patched to include the maintainer." > > For the past year or so Jan Beulich has been the stable tree maintainer. I think it is most efficient for one person to maintain all stable trees, as e.g. the selection of what needs backporting is often pretty common between the trees. So if someone wants to urgently take over from me, I'd be handing over the 4.4 tree at once. But (just like Andrew stated fro XenServer) since I'm doing the backports for our internal purposes anyway, I would be happy to continue pushing the results to the respective upstream trees. > Since Xen 4.5 has branched that opens up a new stable tree and we can also > stop maintaining Xen 4.3 stable tree. Not just yet - there's certainly going to be a wrap-up 4.3.4. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl only waits 33 seconds for ballooning to complete
On Tue, 2015-01-06 at 14:17 -0700, Mike Latimer wrote: > Hi, > > In a previous post (1), I mentioned issues seen while ballooning a large > amount of memory. In the current code, the ballooning process only has 33 > seconds to complete, or the xl operation (i.e. domain create) will fail. When > a lot of ballooning is required, or the host is very slow to balloon memory, > this delay is not sufficient. > > The code involved is tools/libxl/xl_cmdimpl.c:freemem. This function retries > 3 > times, and each retry includes a 10 second delay in > libxl_wait_for_free_memory > and a 1 second delay in libxl_wait_for_memory_target. > > Is there a better approach, which would account for ballooning operations > that > take a much longer time to complete? > > The easiest option is to simply increase the retry count, but that would > again > leave us with a fixed window of time for an operation to complete. It seems > like something that monitors the balloon process, and continues to wait if it > is progressing, might be a better approach. That's exactly what I was about to suggest as I read the penultimate paragraph, i.e. keep waiting so long as some reasonable delta occurs on each iteration. Ian. > > Any ideas? > > Thanks, > Mike > > 1. http://lists.xen.org/archives/html/xen-devel/2014-12/msg01443.html > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] tools/hotplug: add wrapper to start xenstored
On Tue, Jan 06, Ian Campbell wrote: > On Fri, 2014-12-19 at 12:25 +0100, Olaf Hering wrote: > > The shell wrapper in xenstored.service does not handle XENSTORE_TRACE. > > > > Create a separate wrapper script which is used in the sysv runlevel > > script and in systemd xenstored.service. It preserves existing > > behaviour by handling the XENSTORE_TRACE boolean. It also implements > > the handling of XENSTORED_ARGS=. This variable has to be added to > > sysconfig/xencommons. > > Why don't we just drop XENSTORED_* in favour of XENSTORED_ARGS (with an > example in the sysconfig file of enabling tracing if you like)? After having two weeks to think about this I came to the same conclusion. I think whatever the outcome is, the boolean should be removed. The sysconfig file should get a XENSTORED_ARGS="" along with a help text which mentions "-T /path" and "xenstored --help" to get other options because there is no man page. > Going to a wrapper script just to make some fairly uncommon debugging > option marginally more convenient seems like overkill to me, plus > XENSTORED_ARGS would allow for passing other useful options to > xenstored. If I recall correctly the point of the current 'sh -c "exec ..."' stunt was to expand the XENSTORE variable from the sysconfig file. But this approach leads to failures with SELinux because the socket passing does not work this way. Up to now I have not seen a success report for selinux+systemd+xenstored. Maybe its already somewhere in the other unread mails. In my cover letter I provided some possible ways to handle selinux+systemd+xenstored. Ideally the approach "Exec=/usr/bin/env $XENSTORED --no-fork $XENSTORED_ARGS" works because it means its possible to select a binary via the sysconfig file. But it also means the XENSTORE_TRACE boolean has to be removed in favour of the plain XENSTORED_ARGS= approach mentioned above. Finally this would avoid the need for a wrapper script. Hopefully someone with access to a SELinux enabled system will report which approach actually works. > > The wrapper uses exec unconditionally. This works because the > > systemd service file passes --no-fork, which has the desired effect > > that the binary launched by systemd becomes the final daemon > > process. The sysv script does not pass --no-fork, which causes > > xenstored to fork internally to return to the caller of the wrapper > > script. > > > > The place of the wrapper is currently LIBEXEC_BIN, it has to be > > decided what the final location is supposed to be. IanJ wants it in > > "/etc". > > If we go this route then I agree with Ian J. (/etc/xen/scripts, I > suppose). I have not heard back which location has to be used. If /etc/xen/scripts is the place, so be it. I thought this is just for hotplug scripts. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Xen-users] HVM PCI Passthrough: Code 12: Undersized PCI MMIO region?
>>> On 06.01.15 at 14:00, wrote: > On Tue, Dec 02, 2014 at 10:08:35PM -0500, Stephen Oberholtzer wrote: >> (1) Is my guess correct? Or at least close? Depends on whether you assign the devices at boot time of the guest, or via hotplug. In the latter case, I suppose only the MMIO hole size setting as pointed out by Don will help you, while in the former case I'd expect things to work (and if not, using a debug hypervisor with high enough log level would make hvmloader send information to the hypervisor log that ought to help diagnosing the issue). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] tools/hotplug: add wrapper to start xenstored
On Tue, Jan 06, Ian Jackson wrote: > Olaf Hering writes ("[PATCH 7/7] tools/hotplug: add wrapper to start > xenstored"): > > The shell wrapper in xenstored.service does not handle XENSTORE_TRACE. > ... > > +XENSTORED_LIBEXEC = xenstored.sh > > Should be in /etc as previously discussed. Previously I wrote: > > Bottom line: as relevant maintainer, I'm afraid I'm going to insist > that this script be in /etc. > > I'm disappointed. It is not acceptable to resubmit a change ignoring > such unequivocal feedback. Plain "/etc" wont work, I think. /etc/xen/scripts perhaps? But see my other reply to IanC, maybe there is a way to avoid the wrapper. And after having some time to think about this: If one has a need to adjust something, then this could be done in the xencommons script right away. In other words, the modification can be done there instead of calling the wrapper. > Nacked-by: Ian Jackson > > > +hotplug/Linux/xenstored.sh > > Although many of the existing hotplug scripts have this notion of > calling things "foo.sh" because they happen to be written in shell, I > think this is bad practice. > > I would prefer xenstored-wrap or some such. (My co-maintainers may > disagree...) But this is a bit of a bikeshed issue. I agree. Initally I had xenstored-launcher in mind. > > echo -n Starting $XENSTORED... > > - $XENSTORED --pid-file /var/run/xenstored.pid $XENSTORED_ARGS > > + XENSTORED=$XENSTORED \ > > + XENSTORED_TRACE=$XENSTORED_TRACE \ > > + XENSTORED_ARGS=$XENSTORED_ARGS \ > > + ${LIBEXEC_BIN}/xenstored.sh --pid-file > > /var/run/xenstored.pid > > It might be easier to "." xenstore-wrap. Failing that using `export' > will avoid this rather odd and repetitive style. I think thats a good idea. Something like this may work, doing the "." and the "exec" in the subshell: ( set -- --pid-file /var/run/xenstored.pid . xenstored.sh ) > > diff --git a/tools/hotplug/Linux/xenstored.sh.in > > b/tools/hotplug/Linux/xenstored.sh.in > > new file mode 100644 > > index 000..dc806ee > > --- /dev/null > > +++ b/tools/hotplug/Linux/xenstored.sh.in > > @@ -0,0 +1,6 @@ > > +#!/bin/sh > > +if test -n "$XENSTORED_TRACE" > > +then > > + XENSTORED_ARGS=" -T /var/log/xen/xenstored-trace.log" > > +fi > > +exec $XENSTORED $@ $XENSTORED_ARGS > > This should probably have "" around "$@" just in case. Ok. I will wait for results from SELinux testing before respinning this patch. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash on efi_reset_machine on Lenovo ThinkCentre m93 (Haswell)
>>> On 05.01.15 at 16:04, wrote: Odd - > (XEN) [ Xen-4.4.1 x86_64 debug=n Not tainted ] > (XEN) CPU:0 > (XEN) RIP:e008:[] d5fd8412 the address here ... > (XEN) [ Xen-4.5.0-rc-lK x86_64 debug=y Tainted:C ] > (XEN) CPU:0 > (XEN) RIP:e008:[ ] d5fd83d0 ... is different from the one here, yet one would think the firmware always does the same thing on a given runtime services call. Except if the page here wasn't marked for runtime use in the memory map (which you didn't provide). > I hadn't dug deep enough in this to figure out how it works on Linux but > was wondering if anybody else had seen this? Iirc on Linux rebooting via runtime services is only possible when explicitly asking for it on the command line. In any event this very much looks like a firmware issue (and knowing what's at the addresses in question would be interesting), and the only workaround I can think of would be to use "no-efi-rs". Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 Development Update (GA slip by a week)
On Tue, 2015-01-06 at 12:23 -0800, Suriyan Ramasami wrote: > On Tue, Jan 6, 2015 at 10:54 AM, Suriyan Ramasami wrote: > > On Tue, Jan 6, 2015 at 10:51 AM, Lars Kurth > > wrote: > >> > >> On 6 Jan 2015, at 18:08, Suriyan Ramasami wrote: > >> > >>> > >>> I shall try my hand at updating the information again. If this needs > >>> to be done (yesterday), then as a temporary solution we could just > >>> delete this information for now, and I shall work on it soon. > >> > >> Ideally it needs to be done by next Wed. If you have the content, you > >> could send it to me and I can fix the page > > > > Thanks Lars for the offer. I just started editing the page, and looks > > simple enough to get some content out there. I should have updated the > > relevant information in a few hours from now. Please do help me in > > fixing the page once its done (if it needs fixing) > > > > Thanks! > > - Suriyan > > Hello Lars/Ian, >I have updated the wiki somewhat to an OK level of information. Thanks, I've removed the todo banner since there is some content now (I've not reviewed it in detail, since I don't know about the h/w, but it all looks very plausible). Earlier you said: > I did have specific content for this wiki page, as the Arndale XEN > information is Linaro centric and hence not applicable to the OdroidXU > - especially the boot loader part and the linux dom0 part. The rest of > the information is quite similar. I think that some of the Linaro specific stuff on the Arndale page is no longer needed, e.g. I run mainline u-boot and Linux just fine on one these days. So I think it is probably worth trying to refactor into a generic Exynos part and board specific details, but that's not urgent for the 4.5 release. Anyway, not asking you to tackle all that. I've added it to http://wiki.xen.org/wiki/Xen_Document_Days/TODO, along with my previous suggestion to get rid of the duplicated lists on the main ARM page. I might try and have a go at that on the next doc day. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/7 v3] tools/hotplug: systemd changes for 4.5
On Mon, Jan 05, Konrad Rzeszutek Wilk wrote: > +Release Issues > +== > + > +While we did the utmost to get a release out, there are certain > +fixes which were not complete on time. As such please reference this > +section if you are running into trouble. > + > +* systemd not working with Fedora Core 20, 21 or later (systemctl > + reports xenstore failing to start). > + > + Systemd support is now part of Xen source code. While utmost work has > + been done to make the systemd files compatible across all the > + distributions, there might issues when using systemd files from > + Xen sources. The work-around is to define an mount entry in > + /etc/fstab as follow: > + > + tmpfs /var/lib/xenstored tmpfs > + mode=755,context="system_u:object_r:xenstored_var_lib_t:s0" 0 0 > + > + Shouldnt this go into a new SELinux section in the INSTALL file? Its my understanding that the reported SELinux failure is not only related to the context= mount option, but also to the socket passing from systemd. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] qemu upstream arm compile error failed
I tried to use the qemu upstream on xen Dom0 (arm arch) to get the PVFB, PVKBD and PVMOUSE work. However, I got compiler error at the begin. Here are the detail steps: 1. download the source code from " git://git.qemu-project.org/qemu.git ". 2. " cd qemu " 3. configure using command "./configure --cross-prefix=arm-linux-gnueabihf- --target-list=arm-softmmu --enable-fdt" 4. " make " However, I got the following compile error GEN tests/test-qmp-commands.h GEN tests/test-qapi-event.h CCtests/qemu-iotests/socket_scm_helper.o LINK tests/qemu-iotests/socket_scm_helper /usr/lib/gcc-cross/arm-linux-gnueabihf/4.8/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lgthread-2.0 /usr/lib/gcc-cross/arm-linux-gnueabihf/4.8/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lglib-2.0 collect2: error: ld returned 1 exit status make: *** [tests/qemu-iotests/socket_scm_helper] Error 1 Looks like the build process need some libs missed in my gcc-cross/arm-linux-gnueabihf/4.8 What is the reason and solution? Thanks & Regards, Mao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] collect Dirty bitmap from Xen source code
I am working on Xen live Migration. I want to print dirty_bitmap which is given in XEN_GUEST_HANDLE_64(uint8) dirty_bitmap in shadow.h and Dirty bit tracking is given in: int shadow_track_dirty_vram(struct domain *d, unsigned long first_pfn, unsigned long nr, XEN_GUEST_HANDLE_64(uint8) dirty_bitmap); Anyone can help me to print dirty bitmap as i want to collect dirty bitmap from log files or I want to print dirty bitmap.___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] qemu upstream arm compile error failed
On Wed, 2015-01-07 at 18:02 +0800, Mao Mingy wrote: > I tried to use the qemu upstream on xen Dom0 (arm arch) to get the > PVFB, PVKBD and PVMOUSE work. With Xen 4.5 qemu is automatically built for ARM, no need to do it separately AFAIK. Also note that you need to build qemu for i386, even on ARM (since the PV backends are currently entwined with x86 for historical reasons. This doesn't matter since qemu does no CPU emulation under Xen anyway. Lastly: > /usr/lib/gcc-cross/arm-linux-gnueabihf/4.8/../../../../arm-linux-gnueabihf/bin/ld: > cannot find -lgthread-2.0 > /usr/lib/gcc-cross/arm-linux-gnueabihf/4.8/../../../../arm-linux-gnueabihf/bin/ld: > cannot find -lglib-2.0 I think it is pretty obvious from these error messages that you are missing some build dependencies, this isn't a Xen specific issue AFAICT. The solution is to make ARM versions of those libraries available in your cross environment. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/viridian: Add Partition Reference Time enlightenment
>>> On 14.10.14 at 12:45, wrote: > The presence of the partition reference time enlightenment persuades newer > versions of Windows to prefer the TSC as their primary time source. Hence, > if rdtsc is not being emulated and is invariant then many vmexits (for > alternative time sources such as the HPET or reference counter MSR) can > be avoided. > > The implementation is not yet complete as no attempt is made to prevent > emulation of rdtsc if the enlightenment is active and guest and host > TSC frequencies differ. To do that requires invasive changes in the core > x86 time code and hence a lot more testing. > > This patch avoids the issue by disabling the enlightenment if rdtsc is > being emulated, causing Windows to choose another time source. This is > safe, but may cause a big variation in performance of guests migrated > between hosts of differing TSC frequency. Thus the enlightenment is not > enabled in the default set, but may be enabled to improve guest performance > where such migrations are not a concern. > > See section 15.4 of the Microsoft Hypervisor Top Level Functional > Specification v4.0a for details. > > Signed-off-by: Paul Durrant This doesn't apply anymore and hence needs to be re-spun against current staging. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/viridian: Add Partition Reference Time enlightenment
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: 07 January 2015 10:08 > To: Paul Durrant > Cc: Christoph Egger; Ian Campbell; Ian Jackson; Stefano Stabellini; xen- > de...@lists.xenproject.org; Keir (Xen.org) > Subject: Re: [PATCH v4] x86/viridian: Add Partition Reference Time > enlightenment > > >>> On 14.10.14 at 12:45, wrote: > > The presence of the partition reference time enlightenment persuades > newer > > versions of Windows to prefer the TSC as their primary time source. Hence, > > if rdtsc is not being emulated and is invariant then many vmexits (for > > alternative time sources such as the HPET or reference counter MSR) can > > be avoided. > > > > The implementation is not yet complete as no attempt is made to prevent > > emulation of rdtsc if the enlightenment is active and guest and host > > TSC frequencies differ. To do that requires invasive changes in the core > > x86 time code and hence a lot more testing. > > > > This patch avoids the issue by disabling the enlightenment if rdtsc is > > being emulated, causing Windows to choose another time source. This is > > safe, but may cause a big variation in performance of guests migrated > > between hosts of differing TSC frequency. Thus the enlightenment is not > > enabled in the default set, but may be enabled to improve guest > performance > > where such migrations are not a concern. > > > > See section 15.4 of the Microsoft Hypervisor Top Level Functional > > Specification v4.0a for details. > > > > Signed-off-by: Paul Durrant > > This doesn't apply anymore and hence needs to be re-spun against > current staging. Ok. I'll rebase and repost. Paul > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: don't crash when PTE bits 52 and up are non-zero
>>> On 23.12.14 at 07:52, wrote: >> From: Jan Beulich [mailto:jbeul...@suse.com] >> Sent: Friday, December 19, 2014 7:26 PM >> >> This can (and will) be legitimately the case when sharing page tables >> with EPT (more of a problem before p2m_access_rwx became zero, but >> still possible even now when other than that is the default for a >> guest), leading to an unconditional crash (in print_vtd_entries()) >> when a DMA remapping fault occurs. > > could you elaborate the scenarios when bits 52+ are non-zero? > >> >> Signed-off-by: Jan Beulich > > but the changes looks reasonable to me. > > Signed-off-by: Kevin Tian I translated this to a Reviewed-by, as S-o-b doesn't seem to make sense here. Konrad - please indicate whether this can also go into 4.5. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5] x86/viridian: Add Partition Reference Time enlightenment
The presence of the partition reference time enlightenment persuades newer versions of Windows to prefer the TSC as their primary time source. Hence, if rdtsc is not being emulated and is invariant then many vmexits (for alternative time sources such as the HPET or reference counter MSR) can be avoided. The implementation is not yet complete as no attempt is made to prevent emulation of rdtsc if the enlightenment is active and guest and host TSC frequencies differ. To do that requires invasive changes in the core x86 time code and hence a lot more testing. This patch avoids the issue by disabling the enlightenment if rdtsc is being emulated, causing Windows to choose another time source. This is safe, but may cause a big variation in performance of guests migrated between hosts of differing TSC frequency. Thus the enlightenment is not enabled in the default set, but may be enabled to improve guest performance where such migrations are not a concern. See section 15.4 of the Microsoft Hypervisor Top Level Functional Specification v4.0a for details. Signed-off-by: Paul Durrant Cc: Keir Fraser Cc: Jan Beulich Acked-by: Ian Campbell Cc: Ian Jackson Cc: Stefano Stabellini Reviewed-by: Christoph Egger --- v5: - Re-based to staging docs/man/xl.cfg.pod.5 |6 ++ tools/libxl/libxl_dom.c|3 + tools/libxl/libxl_types.idl|1 + xen/arch/x86/hvm/viridian.c| 111 xen/include/asm-x86/hvm/viridian.h | 25 +++ xen/include/public/arch-x86/hvm/save.h | 11 xen/include/public/hvm/params.h|7 +- 7 files changed, 163 insertions(+), 1 deletion(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 622ea53..e2f91fc 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -1231,6 +1231,12 @@ This group incorporates Partition Time Reference Counter MSR. This enlightenment can improve performance of Windows 8 and Windows Server 2012 onwards. +=item B + +This set incorporates the Partition Reference TSC MSR. This +enlightenment can improve performance of Windows 7 and Windows +Server 2008 R2 onwards. + =item B This is a special value that enables the default set of groups, which diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 1d33a18..48d661a 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -263,6 +263,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid, if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT)) mask |= HVMPV_time_ref_count; +if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC)) +mask |= HVMPV_reference_tsc; + if (mask != 0 && xc_hvm_param_set(CTX->xch, domid, diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index f7fc695..1214d2e 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -192,6 +192,7 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [ (0, "base"), (1, "freq"), (2, "time_ref_count"), +(3, "reference_tsc"), ]) # diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c index 3197b6b..cb689f6 100644 --- a/xen/arch/x86/hvm/viridian.c +++ b/xen/arch/x86/hvm/viridian.c @@ -21,6 +21,7 @@ #define VIRIDIAN_MSR_HYPERCALL 0x4001 #define VIRIDIAN_MSR_VP_INDEX 0x4002 #define VIRIDIAN_MSR_TIME_REF_COUNT 0x4020 +#define VIRIDIAN_MSR_REFERENCE_TSC 0x4021 #define VIRIDIAN_MSR_TSC_FREQUENCY 0x4022 #define VIRIDIAN_MSR_APIC_FREQUENCY 0x4023 #define VIRIDIAN_MSR_EOI0x4070 @@ -40,6 +41,7 @@ #define CPUID3A_MSR_APIC_ACCESS(1 << 4) #define CPUID3A_MSR_HYPERCALL (1 << 5) #define CPUID3A_MSR_VP_INDEX (1 << 6) +#define CPUID3A_MSR_REFERENCE_TSC (1 << 9) #define CPUID3A_MSR_FREQ (1 << 11) /* Viridian CPUID 404, Implementation Recommendations. */ @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int *eax, *eax |= CPUID3A_MSR_FREQ; if ( viridian_feature_mask(d) & HVMPV_time_ref_count ) *eax |= CPUID3A_MSR_TIME_REF_COUNT; +if ( viridian_feature_mask(d) & HVMPV_reference_tsc ) +*eax |= CPUID3A_MSR_REFERENCE_TSC; break; case 4: /* Recommended hypercall usage. */ @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v) v, aa->fields.enabled, (unsigned long)aa->fields.pfn); } +static void dump_reference_tsc(const struct domain *d) +{ +const union viridian_reference_tsc *rt; + +rt = &d->arch.hvm_domain.viridian.reference_tsc; + +printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x pfn: %lx\n", + d->domain_id, + rt->fields.e
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
On 07/01/15 09:10, Olaf Hering wrote: > On Mon, Jan 05, Vitaly Kuznetsov wrote: > >> Wei Liu writes: >> >>> Olaf mentioned his concern about handling ballooned pages in >>> <20141211153029.ga1...@aepfle.de>. Is that point moot now? >> >> Well, the limitation is real and some guest-side handling will be >> required in case we want to support kexec with ballooning. But as David >> validly mentioned "It's the responsibility of the guest to ensure it >> either doesn't kexec when it is ballooned or that the kexec kernel can >> handle this". Not sure if we can (and need to) do anything hypevisor- or >> toolstack-side. > > One approach would be to mark all pages as some sort of > populate-on-demand first. Then copy the existing assigned pages from > domA to domB and update the page type. The remaining pages are likely > ballooned. Once the guest tries to access them this should give the > hypervisor and/or toolstack a chance to assign a real RAM page to them. > > I mean, if a host-assisted approach for kexec is implemented then this > approach must also cover ballooning. It is not possible for the hypervisor or toolstack to do what you want because there may not be enough free memory to repopulate the new domain. The guest can handle this by: 1. Not ballooning (this is common in cloud environments). 2. Reducing the balloon prior to kexec. 3. Running the kexec'd image in a reserved chunk of memory (the crash kernel case). 4. Providing balloon information to the kexec'd image. None of these require any additional hypervisor or toolstack support and 1-3 are trivial for a guest to implement. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
Olaf Hering writes: > On Mon, Jan 05, Vitaly Kuznetsov wrote: > >> Wei Liu writes: >> >> > Olaf mentioned his concern about handling ballooned pages in >> > <20141211153029.ga1...@aepfle.de>. Is that point moot now? >> >> Well, the limitation is real and some guest-side handling will be >> required in case we want to support kexec with ballooning. But as David >> validly mentioned "It's the responsibility of the guest to ensure it >> either doesn't kexec when it is ballooned or that the kexec kernel can >> handle this". Not sure if we can (and need to) do anything hypevisor- or >> toolstack-side. > > One approach would be to mark all pages as some sort of > populate-on-demand first. Then copy the existing assigned pages from > domA to domB and update the page type. The remaining pages are likely > ballooned. Once the guest tries to access them this should give the > hypervisor and/or toolstack a chance to assign a real RAM page to > them. The thing is .. we don't have these pages when kexec is being performed, they are already ballooned out and the hypervisor doesn't have the knowledge of which GFNs should be re-populated. I think it is possible to keep track of all pages the guest balloons out for this purpose, but .. > > I mean, if a host-assisted approach for kexec is implemented then this > approach must also cover ballooning. I don't see why solving the issue hypervisor-side is a must. When the guest performs kdump we don't care about the ballooning as we have a separate memory area which is supposed to have no ballooned out pages. When we do kexec nothing stops us from asking balloon driver to bring everything back, it is fine to perform non-trivial work before kexec (e.g. we shutdown all the devices). But, as I said, I'll try playing with ballooning to make these thoughts not purely theoretical. -- Vitaly ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
>>> On 07.01.15 at 11:41, wrote: > On 07/01/15 09:10, Olaf Hering wrote: >> On Mon, Jan 05, Vitaly Kuznetsov wrote: >> >>> Wei Liu writes: >>> Olaf mentioned his concern about handling ballooned pages in <20141211153029.ga1...@aepfle.de>. Is that point moot now? >>> >>> Well, the limitation is real and some guest-side handling will be >>> required in case we want to support kexec with ballooning. But as David >>> validly mentioned "It's the responsibility of the guest to ensure it >>> either doesn't kexec when it is ballooned or that the kexec kernel can >>> handle this". Not sure if we can (and need to) do anything hypevisor- or >>> toolstack-side. >> >> One approach would be to mark all pages as some sort of >> populate-on-demand first. Then copy the existing assigned pages from >> domA to domB and update the page type. The remaining pages are likely >> ballooned. Once the guest tries to access them this should give the >> hypervisor and/or toolstack a chance to assign a real RAM page to them. >> >> I mean, if a host-assisted approach for kexec is implemented then this >> approach must also cover ballooning. > > It is not possible for the hypervisor or toolstack to do what you want > because there may not be enough free memory to repopulate the new domain. > > The guest can handle this by: > > 1. Not ballooning (this is common in cloud environments). > 2. Reducing the balloon prior to kexec. Which may fail because again there may not be enough memory to claim back from the hypervisor. Jan > 3. Running the kexec'd image in a reserved chunk of memory (the crash > kernel case). > 4. Providing balloon information to the kexec'd image. > > None of these require any additional hypervisor or toolstack support and > 1-3 are trivial for a guest to implement. > > David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
On Wed, Jan 07, David Vrabel wrote: > 2. Reducing the balloon prior to kexec. We carry a patch for kexec(1) which does balloon up before doing the actual kexec call. I propose to get such change into the upstream kexec tools if that is indeed the way to go. The benefit is that the guest waits until every ballooned page is populated. If the host is short on memory then the guest will hang instead of crash after kexec. https://build.opensuse.org/package/view_file/Kernel:kdump/kexec-tools/kexec-tools-xen-balloon-up.patch Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
On Wed, Jan 07, Vitaly Kuznetsov wrote: > The thing is .. we don't have these pages when kexec is being performed, > they are already ballooned out and the hypervisor doesn't have the > knowledge of which GFNs should be re-populated. I think it is possible > to keep track of all pages the guest balloons out for this purpose, but > .. Have you tried to make the new guest a PoD guest? That way it may work out of the box already. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs
Changes from v1: * Move event type check from xc to xl; * Add retry capability for MBM sampling; * Fix Coding style/docs; Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature which builds on the CMT infrastructure to allow monitoring of system memory bandwidth. Event codes are provided to monitor both "total" and "local" bandwidth, meaning bandwidth over QPI and other external links can be monitored. For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its dependency on CMT, the software also makes use of most of CMT codes. Actually, besides introducing two additional events and some cpuid feature bits, there are no extra changes compared to cache occupancy monitoring in CMT. Due to this, CMT should be enabled first to use this feature. For interface changes, the patch serial only introduces a new command "XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature capability to user space and introduces two additional options for "xl psr-cmt-show": total_mem_bandwidth: Show total memory bandwidth local_mem_bandwidth: Show local memory bandwidth The usage flow keeps the same with CMT. Chao Peng (5): x86: expose CMT L3 event mask to user space tools: add routine to get CMT L3 event mask tools: correct coding style for psr tools: code refactoring for MBM tools: add total/local memory bandwith monitoring docs/man/xl.pod.1 |9 +++ tools/libxc/include/xenctrl.h | 11 +-- tools/libxc/xc_psr.c | 42 ++-- tools/libxl/libxl.h | 20 -- tools/libxl/libxl_psr.c | 147 +++-- tools/libxl/libxl_types.idl |2 + tools/libxl/xl_cmdimpl.c | 69 +-- tools/libxl/xl_cmdtable.c |4 +- xen/arch/x86/sysctl.c |3 + xen/include/public/sysctl.h |1 + 10 files changed, 254 insertions(+), 54 deletions(-) -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/5] tools: correct coding style for psr
- space: remove space after '(' or before ')' in 'if' condition; - indention: align function definition/call arguments; Signed-off-by: Chao Peng --- tools/libxc/include/xenctrl.h |8 tools/libxc/xc_psr.c | 10 +- tools/libxl/libxl.h | 11 +++ tools/libxl/libxl_psr.c | 11 +++ tools/libxl/xl_cmdimpl.c | 11 ++- 5 files changed, 29 insertions(+), 22 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 96b357c..c6e9e3e 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2693,15 +2693,15 @@ typedef enum xc_psr_cmt_type xc_psr_cmt_type; int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid); int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid); int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid, -uint32_t *rmid); + uint32_t *rmid); int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid); int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, -uint32_t *upscaling_factor); + uint32_t *upscaling_factor); int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask); int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, uint32_t *l3_cache_size); -int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, -uint32_t cpu, uint32_t psr_cmt_type, uint64_t *monitor_data); +int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, +uint32_t psr_cmt_type, uint64_t *monitor_data); int xc_psr_cmt_enabled(xc_interface *xch); #endif diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c index e76a0f9..e3ecc41 100644 --- a/tools/libxc/xc_psr.c +++ b/tools/libxc/xc_psr.c @@ -47,7 +47,7 @@ int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid) } int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid, -uint32_t *rmid) + uint32_t *rmid) { int rc; DECLARE_DOMCTL; @@ -88,7 +88,7 @@ int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid) } int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, -uint32_t *upscaling_factor) + uint32_t *upscaling_factor) { static int val = 0; int rc; @@ -137,7 +137,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask) } int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, - uint32_t *l3_cache_size) + uint32_t *l3_cache_size) { static int val = 0; int rc; @@ -162,8 +162,8 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, return rc; } -int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, -uint32_t cpu, xc_psr_cmt_type type, uint64_t *monitor_data) +int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, +xc_psr_cmt_type type, uint64_t *monitor_data) { xc_resource_op_t op; xc_resource_entry_t entries[2]; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 42ace76..d84ff7f 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1455,10 +1455,13 @@ int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid); int libxl_psr_cmt_enabled(libxl_ctx *ctx); int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type); int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid); -int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid, -uint32_t *l3_cache_size); -int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid, -uint32_t socketid, uint32_t *l3_cache_occupancy); +int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, +uint32_t socketid, +uint32_t *l3_cache_size); +int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, + uint32_t domid, + uint32_t socketid, + uint32_t *l3_cache_occupancy); #endif /* misc */ diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c index 3018a0d..0f2c7e0 100644 --- a/tools/libxl/libxl_psr.c +++ b/tools/libxl/libxl_psr.c @@ -153,8 +153,9 @@ int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid) return rc; } -int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid, - uint32_t *l3_cache_size) +int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, +uint32_t socketid, +uint32_t *l3_cache_size) { GC_INIT(ctx); @@ -178,8 +179,10 @@ out: return rc; } -int libxl_psr_cmt_get_cache_occupancy(libxl_c
[Xen-devel] [PATCH v2 5/5] tools: add total/local memory bandwith monitoring
Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring are supported: total and local memory bandwidth monitoring. To use it, CMT should be enabled in hypervisor. Signed-off-by: Chao Peng --- docs/man/xl.pod.1 |9 + tools/libxc/include/xenctrl.h |2 ++ tools/libxc/xc_psr.c |8 + tools/libxl/libxl.h |8 + tools/libxl/libxl_psr.c | 75 + tools/libxl/libxl_types.idl |2 ++ tools/libxl/xl_cmdimpl.c | 18 ++ tools/libxl/xl_cmdtable.c |4 ++- 8 files changed, 125 insertions(+), 1 deletion(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 6b89ba8..0370625 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just attach the domain id with the monitoring service. When the domain doesn't need to be monitored any more, detach the domain id from the monitoring service. +Intel Broadwell and later server platforms also offer total/local memory +bandwidth monitoring. Xen supports per-domain monitoring for these two +additional monitoring types. Both memory bandwidth monitoring and L3 cache +occupancy monitoring share the same set of underground monitoring service. Once +a domain is attached to the monitoring service, monitoring data can be showed +for any of these monitoring types. + =over 4 =item B [I] @@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring service from a domain. Show monitoring data for a certain domain or all domains. Current supported monitor types are: - "cache-occupancy": showing the L3 cache occupancy. + - "total-mem-bandwidth": showing the total memory bandwidth. + - "local-mem-bandwidth": showing the local memory bandwidth. =back diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index c6e9e3e..06366b5 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, xc_resource_op_t *ops); #if defined(__i386__) || defined(__x86_64__) enum xc_psr_cmt_type { XC_PSR_CMT_L3_OCCUPANCY, +XC_PSR_CMT_TOTAL_MEM_BANDWIDTH, +XC_PSR_CMT_LOCAL_MEM_BANDWIDTH, }; typedef enum xc_psr_cmt_type xc_psr_cmt_type; int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid); diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c index e3ecc41..99cb754 100644 --- a/tools/libxc/xc_psr.c +++ b/tools/libxc/xc_psr.c @@ -23,6 +23,8 @@ #define IA32_CMT_CTR_ERROR_MASK (0x3ull << 62) #define EVTID_L3_OCCUPANCY 0x1 +#define EVTID_TOTAL_MEM_BANDWIDTH 0x2 +#define EVTID_LOCAL_MEM_BANDWIDTH 0x3 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid) { @@ -175,6 +177,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, case XC_PSR_CMT_L3_OCCUPANCY: evtid = EVTID_L3_OCCUPANCY; break; +case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH: +evtid = EVTID_TOTAL_MEM_BANDWIDTH; +break; +case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH: +evtid = EVTID_LOCAL_MEM_BANDWIDTH; +break; default: return -1; } diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index d84ff7f..5ab9d0c 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1462,6 +1462,14 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid, uint32_t socketid, uint32_t *l3_cache_occupancy); +int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx, + uint32_t domid, + uint32_t socketid, + uint32_t *bandwidth); +int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx, + uint32_t domid, + uint32_t socketid, + uint32_t *bandwidth); #endif /* misc */ diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c index 42e74d2..a0cda89 100644 --- a/tools/libxl/libxl_psr.c +++ b/tools/libxl/libxl_psr.c @@ -18,6 +18,7 @@ #define IA32_QM_CTR_ERROR_MASK (0x3ul << 62) +#define MBM_SAMPLE_RETRY_MAX 4 static void libxl__psr_cmt_log_err_msg(libxl__gc *gc, int err) { @@ -240,6 +241,80 @@ out: return rc; } +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc, +uint32_t domid, +xc_psr_cmt_type type, +uint32_t socketid, +uint32_t *bandwidth) +{ +uint64_t sample1, sample2; +uint32_t upscaling_factor; +int retry_attempts = 0; +int rc; + +retry: +rc = libxl__psr_cmt_get_l
[Xen-devel] [PATCH v2 4/5] tools: code refactoring for MBM
Make some internal routines common so that total/local memory bandwidth monitoring in the next patch can make use of them. Signed-off-by: Chao Peng --- tools/libxl/libxl_psr.c | 51 ++- tools/libxl/xl_cmdimpl.c | 54 +++--- 2 files changed, 63 insertions(+), 42 deletions(-) diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c index 0f2c7e0..42e74d2 100644 --- a/tools/libxl/libxl_psr.c +++ b/tools/libxl/libxl_psr.c @@ -179,42 +179,53 @@ out: return rc; } -int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, - uint32_t domid, - uint32_t socketid, - uint32_t *l3_cache_occupancy) +static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc, + uint32_t domid, + xc_psr_cmt_type type, + uint32_t socketid, + uint64_t *data) { -GC_INIT(ctx); - unsigned int rmid; -uint32_t upscaling_factor; -uint64_t monitor_data; int cpu, rc; -xc_psr_cmt_type type; -rc = xc_psr_cmt_get_domain_rmid(ctx->xch, domid, &rmid); +rc = xc_psr_cmt_get_domain_rmid(CTX->xch, domid, &rmid); if (rc < 0 || rmid == 0) { LOGE(ERROR, "fail to get the domain rmid, " "or domain is not attached with platform QoS monitoring service"); -rc = ERROR_FAIL; -goto out; +return ERROR_FAIL; } cpu = libxl__pick_socket_cpu(gc, socketid); if (cpu < 0) { LOGE(ERROR, "failed to get socket cpu"); -rc = ERROR_FAIL; -goto out; +return ERROR_FAIL; } -type = XC_PSR_CMT_L3_OCCUPANCY; -rc = xc_psr_cmt_get_data(ctx->xch, rmid, cpu, type, &monitor_data); +rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data); if (rc < 0) { LOGE(ERROR, "failed to get monitoring data"); -rc = ERROR_FAIL; -goto out; +return ERROR_FAIL; } +return rc; +} + +int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, + uint32_t domid, + uint32_t socketid, + uint32_t *l3_cache_occupancy) +{ +GC_INIT(ctx); +uint64_t data; +uint32_t upscaling_factor; +int rc; + +rc= libxl__psr_cmt_get_l3_monitoring_data(gc, domid, + XC_PSR_CMT_L3_OCCUPANCY, + socketid, &data); +if (rc < 0) +goto out; + rc = xc_psr_cmt_get_l3_upscaling_factor(ctx->xch, &upscaling_factor); if (rc < 0) { LOGE(ERROR, "failed to get L3 upscaling factor"); @@ -222,8 +233,8 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, goto out; } -*l3_cache_occupancy = upscaling_factor * monitor_data / 1024; -rc = 0; +*l3_cache_occupancy = upscaling_factor * data / 1024; + out: GC_FREE; return rc; diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 24f3c8d..09ca73e 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -7846,12 +7846,13 @@ out: } #ifdef LIBXL_HAVE_PSR_CMT -static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo, - uint32_t nr_sockets) +static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo, + libxl_psr_cmt_type type, + uint32_t nr_sockets) { char *domain_name; uint32_t socketid; -uint32_t l3_cache_occupancy; +uint32_t data; if (!libxl_psr_cmt_domain_attached(ctx, dominfo->domid)) return; @@ -7861,15 +7862,21 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo, free(domain_name); for (socketid = 0; socketid < nr_sockets; socketid++) { -if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid, - &l3_cache_occupancy)) -printf("%13u KB", l3_cache_occupancy); +switch (type) { +case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY: +if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, + socketid, &data)) +printf("%13u KB", data); +break; +default: +return; +} } printf("\n"); } -static int psr_cmt_show_cache_occupancy(uint32_t domid) +static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid) { uint32_t i, socketid, nr_sockets, total_rmid; uint32_t l3_cache_size; @@ -7905,19 +7912,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid
[Xen-devel] [PATCH v2 2/5] tools: add routine to get CMT L3 event mask
This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask of XEN_SYSCTL_psr_cmt_op. Signed-off-by: Chao Peng --- tools/libxc/include/xenctrl.h |1 + tools/libxc/xc_psr.c | 24 tools/libxl/libxl.h |1 + tools/libxl/libxl_psr.c | 18 ++ 4 files changed, 44 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 0ad8b8d..96b357c 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2697,6 +2697,7 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid, int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid); int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, uint32_t *upscaling_factor); +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask); int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, uint32_t *l3_cache_size); int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c index 872e6dc..e76a0f9 100644 --- a/tools/libxc/xc_psr.c +++ b/tools/libxc/xc_psr.c @@ -112,6 +112,30 @@ int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, return rc; } +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask) +{ +static int val = 0; +int rc; +DECLARE_SYSCTL; + +if ( val ) +{ +*event_mask = val; +return 0; +} + +sysctl.cmd = XEN_SYSCTL_psr_cmt_op; +sysctl.u.psr_cmt_op.cmd = +XEN_SYSCTL_PSR_CMT_get_l3_event_mask; +sysctl.u.psr_cmt_op.flags = 0; + +rc = xc_sysctl(xch, &sysctl); +if ( !rc ) +val = *event_mask = sysctl.u.psr_cmt_op.u.data; + +return rc; +} + int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, uint32_t *l3_cache_size) { diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 0a123f1..42ace76 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1453,6 +1453,7 @@ int libxl_psr_cmt_attach(libxl_ctx *ctx, uint32_t domid); int libxl_psr_cmt_detach(libxl_ctx *ctx, uint32_t domid); int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid); int libxl_psr_cmt_enabled(libxl_ctx *ctx); +int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type); int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid); int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid, uint32_t *l3_cache_size); diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c index 0437465..3018a0d 100644 --- a/tools/libxl/libxl_psr.c +++ b/tools/libxl/libxl_psr.c @@ -120,6 +120,24 @@ int libxl_psr_cmt_enabled(libxl_ctx *ctx) return xc_psr_cmt_enabled(ctx->xch); } +int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type) +{ +GC_INIT(ctx); +uint32_t event_mask; +int ret; + +ret = xc_psr_cmt_get_l3_event_mask(ctx->xch, &event_mask); +if (ret < 0) { +libxl__psr_cmt_log_err_msg(gc, errno); +ret = 0; +} else { +ret = event_mask & (1 << (type - 1)); +} + +GC_FREE; +return ret; +} + int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid) { GC_INIT(ctx); -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 Development Update (GA slip by a week)
On Tue, Jan 06, Konrad Rzeszutek Wilk wrote: > There is only one outstanding patch and that is "#7 tools/hotplug: add > wrapper to > start xenstored". Olaf is back tomorrow so it might make it .. or not. See my other replies. I think once we know how to deal with SELinux and systemd this change may be not needed anymore. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/5] x86: expose CMT L3 event mask to user space
L3 event mask indicates the event types supported in host, including cache occupancy event as well as local/total memory bandwidth events for Memory Bandwidth Monitoring(MBM). Expose it so all these events can be monitored in user space. Signed-off-by: Chao Peng Reviewed-by: Andrew Cooper Acked-by: Jan Beulich --- xen/arch/x86/sysctl.c |3 +++ xen/include/public/sysctl.h |1 + 2 files changed, 4 insertions(+) diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index 57ad992..611a291 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -157,6 +157,9 @@ long arch_do_sysctl( sysctl->u.psr_cmt_op.u.data = (ret ? 0 : info.size); break; } +case XEN_SYSCTL_PSR_CMT_get_l3_event_mask: +sysctl->u.psr_cmt_op.u.data = psr_cmt->l3.features; +break; default: sysctl->u.psr_cmt_op.u.data = 0; ret = -ENOSYS; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index b3713b3..8552dc6 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -641,6 +641,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t); /* The L3 cache size is returned in KB unit */ #define XEN_SYSCTL_PSR_CMT_get_l3_cache_size 2 #define XEN_SYSCTL_PSR_CMT_enabled 3 +#define XEN_SYSCTL_PSR_CMT_get_l3_event_mask 4 struct xen_sysctl_psr_cmt_op { uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CMT_* */ uint32_t flags; /* padding variable, may be extended for future use */ -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/5] tools: correct coding style for psr
On Wed, Jan 07, 2015 at 07:12:03PM +0800, Chao Peng wrote: > - space: remove space after '(' or before ')' in 'if' condition; > - indention: align function definition/call arguments; > > Signed-off-by: Chao Peng Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
On Tue, 2015-01-06 at 11:15 -0500, Konrad Rzeszutek Wilk wrote: > Hello, > > Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: > "Each stable branch has a maintainer who is nominated/volunteers according to > the Maintainer Election > process described in the project governance document > [http://www.xenproject.org/governance.html]. > This will resulting in the MAINTAINERS file in the relevant branch being > patched to include the maintainer." > > For the past year or so Jan Beulich has been the stable tree maintainer. > > Since Xen 4.5 has branched that opens up a new stable tree and we can also > stop maintaining Xen 4.3 stable tree. > > The nominations are open - please volunteer yourself. In case nobody > volunteers I can also take the role. > > I ask folks to finish voting/nominating by Jan 14th so that when Xen 4.5 comes > out we have an viable stable tree maintainer. I'm not sure how voting is supposed to proceed with multiple nominations (and with the deadline for nominations apparently being the same as for voting), but given that Jan has thrown his hat in the ring and has been doing a fine job with the previous trees, +1 to Jan continuing as the stable tree maintainer. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/5] tools: add routine to get CMT L3 event mask
On 07/01/15 11:12, Chao Peng wrote: > This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask > of XEN_SYSCTL_psr_cmt_op. > > Signed-off-by: Chao Peng > --- > tools/libxc/include/xenctrl.h |1 + > tools/libxc/xc_psr.c | 24 > tools/libxl/libxl.h |1 + > tools/libxl/libxl_psr.c | 18 ++ > 4 files changed, 44 insertions(+) > > diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h > index 0ad8b8d..96b357c 100644 > --- a/tools/libxc/include/xenctrl.h > +++ b/tools/libxc/include/xenctrl.h > @@ -2697,6 +2697,7 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, > uint32_t domid, > int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid); > int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, > uint32_t *upscaling_factor); > +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask); > int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, > uint32_t *l3_cache_size); > int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, > diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c > index 872e6dc..e76a0f9 100644 > --- a/tools/libxc/xc_psr.c > +++ b/tools/libxc/xc_psr.c > @@ -112,6 +112,30 @@ int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, > return rc; > } > > +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask) > +{ > +static int val = 0; This should be uint32_t rather than int. I am somewhat concerned about multithreaded use of libxc, but this is not the first issue in libxc, and probably shouldn't be held against this patch. As the result of the hypercall is going to be the same, the worse that a race could achieve is a wasted hypercall. > +int rc; > +DECLARE_SYSCTL; > + > +if ( val ) > +{ > +*event_mask = val; > +return 0; > +} > + > +sysctl.cmd = XEN_SYSCTL_psr_cmt_op; > +sysctl.u.psr_cmt_op.cmd = > +XEN_SYSCTL_PSR_CMT_get_l3_event_mask; > +sysctl.u.psr_cmt_op.flags = 0; > + > +rc = xc_sysctl(xch, &sysctl); > +if ( !rc ) > +val = *event_mask = sysctl.u.psr_cmt_op.u.data; > + > +return rc; > +} > + > int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, >uint32_t *l3_cache_size) > { > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 0a123f1..42ace76 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -1453,6 +1453,7 @@ int libxl_psr_cmt_attach(libxl_ctx *ctx, uint32_t > domid); > int libxl_psr_cmt_detach(libxl_ctx *ctx, uint32_t domid); > int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid); > int libxl_psr_cmt_enabled(libxl_ctx *ctx); > +int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type); > int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid); > int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid, > uint32_t *l3_cache_size); > diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c > index 0437465..3018a0d 100644 > --- a/tools/libxl/libxl_psr.c > +++ b/tools/libxl/libxl_psr.c > @@ -120,6 +120,24 @@ int libxl_psr_cmt_enabled(libxl_ctx *ctx) > return xc_psr_cmt_enabled(ctx->xch); > } > > +int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type) > +{ > +GC_INIT(ctx); > +uint32_t event_mask; > +int ret; The libxl CODING_SYTLE states that this "ret" should be "rc" ~Andrew > + > +ret = xc_psr_cmt_get_l3_event_mask(ctx->xch, &event_mask); > +if (ret < 0) { > +libxl__psr_cmt_log_err_msg(gc, errno); > +ret = 0; > +} else { > +ret = event_mask & (1 << (type - 1)); > +} > + > +GC_FREE; > +return ret; > +} > + > int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid) > { > GC_INIT(ctx); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign
On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > From: Jenny Herbert > > Use the "foreign" page flag to mark pages that have a grant map. Use > page->private to store information of the grant (the granting domain > and the grant reference). > > Signed-off-by: Jenny Herbert > Signed-off-by: David Vrabel > --- > arch/x86/xen/p2m.c| 50 > ++--- > include/xen/grant_table.h | 13 > 2 files changed, 56 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c > index 0d70814..22624a3 100644 > --- a/arch/x86/xen/p2m.c > +++ b/arch/x86/xen/p2m.c > @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned > long mfn) > return true; > } > > +static int > +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref) I'd be inclined to add "map" to the names somewhere, otherwise people might thing they need to call this when allocating a grant in the f.e. or other things. > +{ > +#ifdef CONFIG_X86_64 Rather than suggesting to add CONFIG_ARM_64 here I'll suggest BITS_PER_LONG >= 64. > + uint64_t gref; > + uint64_t* gref_p = &gref; > +#else > + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL); Might this allocation be happening during e.g. swapping? I suppose it is backend only, and swapping to a loopback vbd would be pretty mad. If you can figure a reasonable use case for that you might want some extra GFP flags? Might this be hot enough to warrant using a specific kmem_cache? > + if (!gref) > + return -ENOMEM; > + uint64_t* gref = gref_p; > +#endif > + > + *gref_p = ((uint64_t) grantref << 32) | domid; > + p->private = gref; There is a set_page_private() macro, which doesn't seem to do much but I suppose you should use it (and page_private() for accessing, if you don't already). > @@ -182,4 +183,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref > *unmap_ops, > void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count); > void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count); > > +static inline void > +get_page_grant_ref(struct page *p, domid_t* domid, grant_ref_t* grantref) > +{ BUG_ON(!PageBlah(p))? > +#ifdef CONFIG_X86_64 > + uint64_t gref = p->private; > +#else > + uint64_t gref = *p->private; > +#endif > + *domid = gref & 0x; > + *grantref = gref >> 32; > +} > + > #endif /* __ASM_GNTTAB_H__ */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [libvirt] [PATCH RFC] libxl: fix paths in capability string
On Tue, Jan 06, 2015 at 10:28:13AM -0700, Eric Blake wrote: > On 01/06/2015 09:12 AM, Wei Liu wrote: > > Currently libxl driver hardcodes some paths in its capability string, > > which might not be the correct paths. > > > > This patch introduces --with-libxl-prefix, so that user can specify the > > prefix used to build Xen tools. The default value is /usr/local which is > > the default --prefix for Xen tools. > > Why can't you just make '--with-libxl=/path/to/alt' work? That is, > --with-libxl=yes uses a default (which matches the prefix where LIBVIRT > will be installed [1]), --with-libxl=no turns it off, and specifying a > path says to use that path. Then you don't need to add a new option. > But --with-libxl is used to specify build time path which can be different from runtime path if I build libvirt against a non-installed version of xen. Consider I have a non-installed build of libxl in /local/scratch/xen.git/dist/install/usr/local but not /usr/local. I don't want that "/local/scratch..." end up in capability string. Wei. > [1] The assumption generally is that whoever is building libvirt has > also built libxl to be installed in the same location - which is a nicer > default than hard-coding a /usr/local default that libxl uses in > isolation. Of course, libvirt also defaults to /usr/local in isolation, > so if you don't specify anything at all, then ./configure will see that > libvirt is going into /usr/local and will therefore default to looking > for libxl also in /usr/local; but for a distro packager, it is nicer to > have the mere specification of --prefix switch all other relative > defaults to play nicely with everything else in the distro. > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [libvirt] [PATCH RFC] libxl: fix paths in capability string
On Tue, Jan 06, 2015 at 05:36:57PM +, Daniel P. Berrange wrote: > On Tue, Jan 06, 2015 at 10:28:13AM -0700, Eric Blake wrote: > > On 01/06/2015 09:12 AM, Wei Liu wrote: > > > Currently libxl driver hardcodes some paths in its capability string, > > > which might not be the correct paths. > > > > > > This patch introduces --with-libxl-prefix, so that user can specify the > > > prefix used to build Xen tools. The default value is /usr/local which is > > > the default --prefix for Xen tools. > > > > Why can't you just make '--with-libxl=/path/to/alt' work? That is, > > --with-libxl=yes uses a default (which matches the prefix where LIBVIRT > > will be installed [1]), --with-libxl=no turns it off, and specifying a > > path says to use that path. Then you don't need to add a new option. > > Yep, that would be preferrable. > > Alternatively the next best would be for xen to include a pkg-config > configuration file, with a custom variable that reports the paths > required > This might be a useful thing in its own right. Thanks for the suggestion. Wei. > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] libxl: fix paths in capability string
Jim, another idea: if those strings are likely to be wrong and in fact not used, can we just not print them? Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 07/12] xen-netback: use foreign page information from the pages themselves
On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > From: Jenny Herbert > > Use the foreign page flag in netback to get the domid and grant ref > needed for the grant copy. This signficiantly simplifies the netback > code and makes netback work with foreign pages from other backends > (e.g., blkback). > > This allows blkback to use iSCSI disks provided by domUs running on > the same host. > > Signed-off-by: Jenny Herbert > Signed-off-by: David Vrabel > --- > drivers/net/xen-netback/netback.c | 97 > +++-- > 1 file changed, 6 insertions(+), 91 deletions(-) Gotta love that diffstat... Acked-by: Ian Campbell ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
"Jan Beulich" writes: On 07.01.15 at 11:41, wrote: >> On 07/01/15 09:10, Olaf Hering wrote: >>> On Mon, Jan 05, Vitaly Kuznetsov wrote: >>> Wei Liu writes: > Olaf mentioned his concern about handling ballooned pages in > <20141211153029.ga1...@aepfle.de>. Is that point moot now? Well, the limitation is real and some guest-side handling will be required in case we want to support kexec with ballooning. But as David validly mentioned "It's the responsibility of the guest to ensure it either doesn't kexec when it is ballooned or that the kexec kernel can handle this". Not sure if we can (and need to) do anything hypevisor- or toolstack-side. >>> >>> One approach would be to mark all pages as some sort of >>> populate-on-demand first. Then copy the existing assigned pages from >>> domA to domB and update the page type. The remaining pages are likely >>> ballooned. Once the guest tries to access them this should give the >>> hypervisor and/or toolstack a chance to assign a real RAM page to them. >>> >>> I mean, if a host-assisted approach for kexec is implemented then this >>> approach must also cover ballooning. >> >> It is not possible for the hypervisor or toolstack to do what you want >> because there may not be enough free memory to repopulate the new domain. >> >> The guest can handle this by: >> >> 1. Not ballooning (this is common in cloud environments). >> 2. Reducing the balloon prior to kexec. > > Which may fail because again there may not be enough memory to > claim back from the hypervisor. > Yes, but it may be better to cancel kexec at this point. -- Vitaly ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
On 07/01/2015 11:26, "Ian Campbell" wrote: >On Tue, 2015-01-06 at 11:15 -0500, Konrad Rzeszutek Wilk wrote: >> Hello, >> >> Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: >> "Each stable branch has a maintainer who is nominated/volunteers >>according to the Maintainer Election >> process described in the project governance document >>[http://www.xenproject.org/governance.html]. >> This will resulting in the MAINTAINERS file in the relevant branch >>being patched to include the maintainer." >> >> For the past year or so Jan Beulich has been the stable tree maintainer. >> >> Since Xen 4.5 has branched that opens up a new stable tree and we can >>also >> stop maintaining Xen 4.3 stable tree. >> >> The nominations are open - please volunteer yourself. In case nobody >> volunteers I can also take the role. >> >> I ask folks to finish voting/nominating by Jan 14th so that when Xen >>4.5 comes >> out we have an viable stable tree maintainer. > >I'm not sure how voting is supposed to proceed with multiple nominations >(and with the deadline for nominations apparently being the same as for >voting), Actually, it is questionable whether there are multiple nominations. Andrew said "If Jan wants a break, I would be happy to volunteer." I am also not convinced that we need an election, unless the existing maintainer wants to steps down. We never had one in the past. And we don't have an explicit nomination for Release Managers unless the existing RM steps down. I can't find the mailing list discussion which led to http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases (the link in the change history seems to be wrong). Maybe we should just change the document to clarify that an election is only needed if the previous maintainer steps down, which is what I think the intention really was. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > From: Jenny Herbert > > Introduce gnttab_unmap_refs_async() that can be used to safely unmap > pages that may be in use (ref count > 1). If the pages are in use the > unmap is deferred and retried later. This polling is not very clever > but it should be good enough if the cases where the delay is necessary > are rare. > > This is needed to allow block backends using grant mapping to safely > use network storage (block or filesystem based such as iSCSI or NFS). > > The network storage driver may complete a block request whilst there > is a queued network packet retry (because the ack from the remote end > races with deciding to queue the retry). The pages for the retried > packet would be grant unmapped and the network driver (or hardware) > would access the unmapped page. I thought this had been solved a little while ago by mapping a scratch page on unmap even for kernel space grant mappings, but both the design doc and here imply not (i.e. the scratch is for user grant mappings only), so I must be misremembering. Regardless, this approach seems likely to be far better... ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On Wed, 2015-01-07 at 12:00 +, Ian Campbell wrote: > On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > > From: Jenny Herbert > > > > Introduce gnttab_unmap_refs_async() that can be used to safely unmap > > pages that may be in use (ref count > 1). If the pages are in use the > > unmap is deferred and retried later. This polling is not very clever > > but it should be good enough if the cases where the delay is necessary > > are rare. > > > > This is needed to allow block backends using grant mapping to safely > > use network storage (block or filesystem based such as iSCSI or NFS). > > > > The network storage driver may complete a block request whilst there > > is a queued network packet retry (because the ack from the remote end > > races with deciding to queue the retry). The pages for the retried > > packet would be grant unmapped and the network driver (or hardware) > > would access the unmapped page. > > I thought this had been solved a little while ago by mapping a scratch > page on unmap even for kernel space grant mappings, but both the design > doc and here imply not (i.e. the scratch is for user grant mappings > only), so I must be misremembering. > > Regardless, this approach seems likely to be far better... None of the following patches switch netback to use it though, I think because the use of SKBTX_DEV_ZEROCOPY makes it unnecessary. But perhaps this lazy unmap scheme would be a better fix than SKB zerocopy for that case too. The current stuff has the downside of always copying skbs from guests destined for dom0 itself (rather than just passing through on there way to a pysical NIC or another VIF), IIRC. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
On 07/01/15 11:59, Lars Kurth wrote: > > On 07/01/2015 11:26, "Ian Campbell" wrote: > >> On Tue, 2015-01-06 at 11:15 -0500, Konrad Rzeszutek Wilk wrote: >>> Hello, >>> >>> Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: >>> "Each stable branch has a maintainer who is nominated/volunteers >>> according to the Maintainer Election >>> process described in the project governance document >>> [http://www.xenproject.org/governance.html]. >>> This will resulting in the MAINTAINERS file in the relevant branch >>> being patched to include the maintainer." >>> >>> For the past year or so Jan Beulich has been the stable tree maintainer. >>> >>> Since Xen 4.5 has branched that opens up a new stable tree and we can >>> also >>> stop maintaining Xen 4.3 stable tree. >>> >>> The nominations are open - please volunteer yourself. In case nobody >>> volunteers I can also take the role. >>> >>> I ask folks to finish voting/nominating by Jan 14th so that when Xen >>> 4.5 comes >>> out we have an viable stable tree maintainer. >> I'm not sure how voting is supposed to proceed with multiple nominations >> (and with the deadline for nominations apparently being the same as for >> voting), > Actually, it is questionable whether there are multiple nominations. > Andrew said "If Jan wants a break, I would be happy to volunteer." Quite - I am happy for Jan to continue in this role. I would also agree that having the same person doing all stable trees would be far more efficient than any split alternative. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > In an x86 PV guest, get_user_pages_fast() on a userspace address range > containing foreign mappings does not work correctly because the M2P > lookup of the MFN from a userspace PTE may return the wrong page. > > Force get_user_pages_fast() to fail on such addresses by marking the PTEs > as special. > > If Xen has XENFEAT_gnttab_map_avail_bits (available since at least > 4.0), http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0 pvpops already requires >= 4.0 too, which matches my recollection (something to do with a new APIC interface which upstream insisted on during upstreaming, IIRC), but both could be out of date... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 4/5] tools: code refactoring for MBM
On Wed, Jan 07, 2015 at 07:12:04PM +0800, Chao Peng wrote: [...] > -int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, > - uint32_t domid, > - uint32_t socketid, > - uint32_t *l3_cache_occupancy) > +static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc, > + uint32_t domid, > + xc_psr_cmt_type type, > + uint32_t socketid, > + uint64_t *data) > { > -GC_INIT(ctx); > - > unsigned int rmid; > -uint32_t upscaling_factor; > -uint64_t monitor_data; > int cpu, rc; > -xc_psr_cmt_type type; > > -rc = xc_psr_cmt_get_domain_rmid(ctx->xch, domid, &rmid); > +rc = xc_psr_cmt_get_domain_rmid(CTX->xch, domid, &rmid); > if (rc < 0 || rmid == 0) { > LOGE(ERROR, "fail to get the domain rmid, " > "or domain is not attached with platform QoS monitoring > service"); > -rc = ERROR_FAIL; > -goto out; > +return ERROR_FAIL; Please retain the "goto out" idiom if possible. > } > > cpu = libxl__pick_socket_cpu(gc, socketid); > if (cpu < 0) { > LOGE(ERROR, "failed to get socket cpu"); > -rc = ERROR_FAIL; > -goto out; > +return ERROR_FAIL; > } > > -type = XC_PSR_CMT_L3_OCCUPANCY; > -rc = xc_psr_cmt_get_data(ctx->xch, rmid, cpu, type, &monitor_data); > +rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data); > if (rc < 0) { > LOGE(ERROR, "failed to get monitoring data"); > -rc = ERROR_FAIL; > -goto out; > +return ERROR_FAIL; > } > > +return rc; > +} > + > +int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, > + uint32_t domid, > + uint32_t socketid, > + uint32_t *l3_cache_occupancy) > +{ > +GC_INIT(ctx); > +uint64_t data; > +uint32_t upscaling_factor; > +int rc; > + > +rc= libxl__psr_cmt_get_l3_monitoring_data(gc, domid, "rc =" Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
On Wed, 2015-01-07 at 11:59 +, Lars Kurth wrote: > > > On 07/01/2015 11:26, "Ian Campbell" wrote: > > >On Tue, 2015-01-06 at 11:15 -0500, Konrad Rzeszutek Wilk wrote: > >> Hello, > >> > >> Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: > >> "Each stable branch has a maintainer who is nominated/volunteers > >>according to the Maintainer Election > >> process described in the project governance document > >>[http://www.xenproject.org/governance.html]. > >> This will resulting in the MAINTAINERS file in the relevant branch > >>being patched to include the maintainer." > >> > >> For the past year or so Jan Beulich has been the stable tree maintainer. > >> > >> Since Xen 4.5 has branched that opens up a new stable tree and we can > >>also > >> stop maintaining Xen 4.3 stable tree. > >> > >> The nominations are open - please volunteer yourself. In case nobody > >> volunteers I can also take the role. > >> > >> I ask folks to finish voting/nominating by Jan 14th so that when Xen > >>4.5 comes > >> out we have an viable stable tree maintainer. > > > >I'm not sure how voting is supposed to proceed with multiple nominations > >(and with the deadline for nominations apparently being the same as for > >voting), > > Actually, it is questionable whether there are multiple nominations. > Andrew said "If Jan wants a break, I would be happy to volunteer." True, and Konrad said "if nobody else...". Still, my +1 for Jan stands. > I am also not convinced that we need an election, unless the existing > maintainer wants to steps down. We never had one in the past. And we don't > have an explicit nomination for Release Managers unless the existing RM > steps down. > > I can't find the mailing list discussion which led to > http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases (the link > in the change history seems to be wrong). http://lists.xen.org/archives/html/xen-devel/2012-11/msg01391.html perhaps? > Maybe we should just change the > document to clarify that an election is only needed if the previous > maintainer steps down, which is what I think the intention really was. > Seems reasonable to me, presumably some existing mechanism (i.e. common sense...) exists if the incumbent goes off the rails or disappears without resigning etc. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] community develop/contributor call
Hi all, For a few years now I've been running a monthly phone call between various organisations which contribute to Xen, the so called "TCT Call"[0]. However since then the Xen Project has moved to the Linux foundation and the introduction of the Advisory Board means this call isn't really useful in its current form any more. Rather than just cancelling it I've been thinking of repurposing it into a more development focused call for all contributors (maintainers, patch submitters etc). The idea would be to provide a regular slot where technical and development topics can be discussed on an on demand basis without the need to setup an ad-hoc call each time something comes up. In other words to provide a forum for topics such as thrashing out a more complex design for a new feature, building consensus on how to move forward with a stalled patch series, discussing difficult freeze exception requests during release periods and cases where on list discussion has stalled or those involved feel the need to have a "live" chat about it for some reason. My intention would be the call would be targeted at maintainers and people who are themselves working on the code/design, although anyone will be welcome to dial in and take part. The plan would be to post a call for agenda items to xen-devel a week beforehand with the call being cancelled the day before if no topics were proposed (I have no problem with the call only happening occasionally, there's no point in having it if there are no topics, but if it's useful a couple of times a year it's worth having IMHO). I will try and reach out to anyone who should be involved with a proposed topic to try and ensure that the call has the right people on it in order to make progress. In the event that some of the participants feel that a topic is still being usefully discussed on list and that discussion should remain there then I'll mediate to either gain consensus one way or another or unblock things some other way etc. I would expect people decide based on the published agenda each month whether they want/need to attend. I hope that at least the maintainers would make an effort to attend if there were topics covering their areas (and as above I will chase where needed). In the short term I intend to stick with the same time (5PM UK, on the second Wednesday of the month) and cadence (monthly) as before, but I might revisit the timing, in particular if we find the proposed topics make the time problematic for key people who want to be involved (e.g. folks in the Far East, for whom 5PM UK is pretty antisocial). Since that schedule puts a call a week today (14th Jan), I'll send out a call for agenda later today. Ian. [0] http://wiki.xen.org/wiki/Xen_Technical_Coordination_Team_%28TCT%29_Meeting ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/5] tools: add total/local memory bandwith monitoring
On Wed, Jan 07, 2015 at 07:12:05PM +0800, Chao Peng wrote: [...] > +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc, > +uint32_t domid, > +xc_psr_cmt_type type, > +uint32_t socketid, > +uint32_t *bandwidth) > +{ > +uint64_t sample1, sample2; > +uint32_t upscaling_factor; > +int retry_attempts = 0; > +int rc; > + > +retry: > +rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid, > + &sample1); > +if (rc < 0) > +return ERROR_FAIL; > + > +usleep(1); > + > +rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid, > + &sample2); > +if (rc < 0) > + return ERROR_FAIL; > + > +if (sample2 < sample1) { > + if (retry_attempts < MBM_SAMPLE_RETRY_MAX) { > + retry_attempts++; > + goto retry; > + } else { > + LOGE(ERROR, "event counter overflowed"); > + return ERROR_FAIL; > + } > +} > + Two things: 1. Use for/while for retry loop, don't use goto. 2. Use "goto out" idiom for error handling. The "goto out" idiom is part of documented libxl error handling section in tools/libxl/CODING_STYLE. You might find that document useful reference. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] check-headers error in staging
After upgrade to current staging, my test packages fail to build: [ 289s] + make -j4 -k -C tools/include/xen-foreign [ 289s] make: Entering directory `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' [ 289s] python mkheader.py arm32 arm32.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-arm.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/xen.h [ 289s] python mkheader.py arm64 arm64.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-arm.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/xen.h [ 289s] python mkheader.py x86_32 x86_32.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-x86/xen-x86_32.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/ [ 289s] python mkheader.py x86_64 x86_64.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-x86/xen-x86_64.h /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/ [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c [ 289s] ./checker > tmp.size [ 289s] diff -u reference.size tmp.size [ 289s] --- reference.size 2015-01-07 10:28:57.0 + [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + [ 289s] @@ -9,6 +9,6 @@ [ 289s] arch_vcpu_info| 0 0 24 16 [ 289s] vcpu_time_info| 32 32 32 32 [ 289s] vcpu_info | 48 48 64 64 [ 289s] -arch_shared_info | 0 0 268 280 [ 289s] -shared_info |1088108825843368 [ 289s] +arch_shared_info | 0 0 24 48 [ 289s] +shared_info |1088108823403136 [ 289s] [ 289s] make: *** [check-headers] Error 1 this was the last successful build: xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 this build fails: xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 So it looks like something between git commites dcd8486..dd94cac causes this failure. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Nominations for Xen 4.5 stable tree maintainer.
>> Maybe we should just change the >> document to clarify that an election is only needed if the previous >> maintainer steps down, which is what I think the intention really was. > > Seems reasonable to me, presumably some existing mechanism (i.e. common > sense...) exists if the incumbent goes off the rails or disappears > without resigning etc. Ian, thanks for digging out the link. Looking through the thread we said, the consensus seems to have been to handle stable tree maintainers like maintainers. "Each stable branch has a maintainer who is nominated/volunteers according to the Maintainer Election process described in the project governance document" doesn't imply a formal election. The governance document states * Nomination: A maintainer should nominate himself by proposing a patch to the MAINTAINERS file or mailing a nomination to the project's mailing list. Alternatively another maintainer may nominate a community member. A nomination should explain the contributions of proposed maintainer to the project as well as a scope (set of owned components). Where the case is not obvious, evidence such as specific patches and other evidence supporting the nomination should be cited. * Confirmation: Normally, there is no need for a direct election to confirm a new maintainer. Discussion should happen on the mailing list using the principles of consensus decision making. If there is disagreement or doubt, the project lead or a committer should ask the community manager to arrange a more formal vote So I would propose to replace "Each stable branch has a maintainer who is nominated/volunteers according to the Maintainer Election process described in the project governance document. This will resulting in the MAINTAINERS file in the relevant branch being patched to include the maintainer." with "Each stable branch has a maintainer who is nominated/volunteers according to the Maintainer Election process described in the project governance document. This means that the stable branch maintainer nominates himself or is nominated by another maintainer on the mailing list or through a patch to the MAINTAINERS file on the relevant branch. The principles of consensus decision making are applied, unless there is disagreement, in which case a formal election may be needed. The MAINTAINERS file in the relevant branch will be patched to include the stable branch maintainer." That would mean that we don't have to go through this discussion again for 4.6. Lars On 07/01/2015 12:17, "Ian Campbell" wrote: >On Wed, 2015-01-07 at 11:59 +, Lars Kurth wrote: >> >> >> On 07/01/2015 11:26, "Ian Campbell" wrote: >> >> >On Tue, 2015-01-06 at 11:15 -0500, Konrad Rzeszutek Wilk wrote: >> >> Hello, >> >> >> >> Per http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases: >> >> "Each stable branch has a maintainer who is nominated/volunteers >> >>according to the Maintainer Election >> >> process described in the project governance document >> >>[http://www.xenproject.org/governance.html]. >> >> This will resulting in the MAINTAINERS file in the relevant branch >> >>being patched to include the maintainer." >> >> >> >> For the past year or so Jan Beulich has been the stable tree >>maintainer. >> >> >> >> Since Xen 4.5 has branched that opens up a new stable tree and we can >> >>also >> >> stop maintaining Xen 4.3 stable tree. >> >> >> >> The nominations are open - please volunteer yourself. In case nobody >> >> volunteers I can also take the role. >> >> >> >> I ask folks to finish voting/nominating by Jan 14th so that when Xen >> >>4.5 comes >> >> out we have an viable stable tree maintainer. >> > >> >I'm not sure how voting is supposed to proceed with multiple >>nominations >> >(and with the deadline for nominations apparently being the same as for >> >voting), >> >> Actually, it is questionable whether there are multiple nominations. >> Andrew said "If Jan wants a break, I would be happy to volunteer." > >True, and Konrad said "if nobody else...". > >Still, my +1 for Jan stands. > >> I am also not convinced that we need an election, unless the existing >> maintainer wants to steps down. We never had one in the past. And we >>don't >> have an explicit nomination for Release Managers unless the existing RM >> steps down. >> >> I can't find the mailing list discussion which led to >> http://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases (the >>link >> in the change history seems to be wrong). > >http://lists.xen.org/archives/html/xen-devel/2012-11/msg01391.html >perhaps? > >> Maybe we should just change the >> document to clarify that an election is only needed if the previous >> maintainer steps down, which is what I think the intention really was. >> > >Seems reasonable to me, presumably some existing mechanism (i.e. common >sense...) exists if the incumbent goes off the rails or disappears >without resigning etc. > ___ Xen-devel mailing
Re: [Xen-devel] check-headers error in staging
On 07/01/15 12:37, Olaf Hering wrote: > After upgrade to current staging, my test packages fail to build: > > [ 289s] + make -j4 -k -C tools/include/xen-foreign > [ 289s] make: Entering directory > `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' > [ 289s] python mkheader.py arm32 arm32.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-arm.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/xen.h > [ 289s] python mkheader.py arm64 arm64.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-arm.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/xen.h > [ 289s] python mkheader.py x86_32 x86_32.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-x86/xen-x86_32.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/ > [ 289s] python mkheader.py x86_64 x86_64.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/public/arch-x86/xen-x86_64.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../../../xen/include/ > [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 > [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer > -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c > [ 289s] ./checker > tmp.size > [ 289s] diff -u reference.size tmp.size > [ 289s] --- reference.size 2015-01-07 10:28:57.0 + > [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + > [ 289s] @@ -9,6 +9,6 @@ > [ 289s] arch_vcpu_info| 0 0 24 16 > [ 289s] vcpu_time_info| 32 32 32 32 > [ 289s] vcpu_info | 48 48 64 64 > [ 289s] -arch_shared_info | 0 0 268 280 > [ 289s] -shared_info |1088108825843368 > [ 289s] +arch_shared_info | 0 0 24 48 > [ 289s] +shared_info |1088108823403136 > [ 289s] > [ 289s] make: *** [check-headers] Error 1 > > > this was the last successful build: > xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 > > this build fails: > xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 > > > So it looks like something between git commites dcd8486..dd94cac causes this > failure. It will be c/s dac6d3b1a "expand x86 arch_shared_info to support linear p2m list" There is a shrink to the size of arch_shared_info, but was argued and accepted as safe to do. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] check-headers error in staging
>>> On 07.01.15 at 13:45, wrote: > On 07/01/15 12:37, Olaf Hering wrote: >> After upgrade to current staging, my test packages fail to build: >> >> [ 289s] + make -j4 -k -C tools/include/xen-foreign >> [ 289s] make: Entering directory > `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' >> [ 289s] python mkheader.py arm32 arm32.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/public/arch-arm.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > /../../xen/include/public/xen.h >> [ 289s] python mkheader.py arm64 arm64.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/public/arch-arm.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > /../../xen/include/public/xen.h >> [ 289s] python mkheader.py x86_32 x86_32.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/public/arch-x86/xen-x86_32.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/ >> [ 289s] python mkheader.py x86_64 x86_64.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/public/arch-x86/xen-x86_64.h > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > ./../xen/include/ >> [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 >> [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer > -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c >> [ 289s] ./checker > tmp.size >> [ 289s] diff -u reference.size tmp.size >> [ 289s] --- reference.size 2015-01-07 10:28:57.0 + >> [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + >> [ 289s] @@ -9,6 +9,6 @@ >> [ 289s] arch_vcpu_info| 0 0 24 16 >> [ 289s] vcpu_time_info| 32 32 32 32 >> [ 289s] vcpu_info | 48 48 64 64 >> [ 289s] -arch_shared_info | 0 0 268 280 >> [ 289s] -shared_info |1088108825843368 >> [ 289s] +arch_shared_info | 0 0 24 48 >> [ 289s] +shared_info |1088108823403136 >> [ 289s] >> [ 289s] make: *** [check-headers] Error 1 >> >> >> this was the last successful build: >> xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 >> >> this build fails: >> xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 >> >> >> So it looks like something between git commites dcd8486..dd94cac causes this > failure. > > It will be c/s dac6d3b1a "expand x86 arch_shared_info to support linear > p2m list" > > There is a shrink to the size of arch_shared_info, but was argued and > accepted as safe to do. So it looks like I should revert that one then, as it'll cause an unconditional build failure in the tools part of the build afaict. Plus it's not clear how to properly express the now variable size to the checking logic, i.e. resolving the issue may take some time. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On 07/01/15 12:00, Ian Campbell wrote: > On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: >> From: Jenny Herbert >> >> Introduce gnttab_unmap_refs_async() that can be used to safely unmap >> pages that may be in use (ref count > 1). If the pages are in use the >> unmap is deferred and retried later. This polling is not very clever >> but it should be good enough if the cases where the delay is necessary >> are rare. >> >> This is needed to allow block backends using grant mapping to safely >> use network storage (block or filesystem based such as iSCSI or NFS). >> >> The network storage driver may complete a block request whilst there >> is a queued network packet retry (because the ack from the remote end >> races with deciding to queue the retry). The pages for the retried >> packet would be grant unmapped and the network driver (or hardware) >> would access the unmapped page. > > I thought this had been solved a little while ago by mapping a scratch > page on unmap even for kernel space grant mappings, but both the design > doc and here imply not (i.e. the scratch is for user grant mappings > only), so I must be misremembering. It was only for user grant mappings and it did not fix the case where the page being unmapped was currently dma mapped. This could have resulted in the NIC transmitting sensitive data. e.g., 1. iscsi queues a retransmit with page P (frame F). 2. NIC driver DMA maps and queues frame F on h/w. 3. iscsi completes the I/O. 4. page P is unmapped. 5. response is sent to guest 6. guest reuses frame F. 7. NIC transmits frame F. We don't use this safe unmap mechanism for netback because the zero copy stuff means we don't need it and the polling on the unmap is high latency and only good enough if the polling is needed very rarely. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
On 07/01/15 12:11, Ian Campbell wrote: > On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: >> In an x86 PV guest, get_user_pages_fast() on a userspace address range >> containing foreign mappings does not work correctly because the M2P >> lookup of the MFN from a userspace PTE may return the wrong page. >> >> Force get_user_pages_fast() to fail on such addresses by marking the PTEs >> as special. >> >> If Xen has XENFEAT_gnttab_map_avail_bits (available since at least >> 4.0), > > http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0 > pvpops already requires >= 4.0 too, which matches my recollection > (something to do with a new APIC interface which upstream insisted on > during upstreaming, IIRC), but both could be out of date... gntdev is usable by driver domains and useful for inter-domain comms so it isn't limited to dom0 use only and Linux still needs to run on Xen 3.2 (I think that's the oldest still available on AWS). Because of the m2p override limitation, gntdev is currently unsafe[1] to use by untrusted userspace apps so there's no (new) security issues here. However, we could disable gntdev if this feature is messing unless overridden by a module option. Opinions on this? David [1] mapping a ref twice or a two refs for the same frame can corrupt kernel state is various exciting ways because of messed up page ref counts. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On Wed, 2015-01-07 at 13:07 +, David Vrabel wrote: > On 07/01/15 12:00, Ian Campbell wrote: > > On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > >> From: Jenny Herbert > >> > >> Introduce gnttab_unmap_refs_async() that can be used to safely unmap > >> pages that may be in use (ref count > 1). If the pages are in use the > >> unmap is deferred and retried later. This polling is not very clever > >> but it should be good enough if the cases where the delay is necessary > >> are rare. > >> > >> This is needed to allow block backends using grant mapping to safely > >> use network storage (block or filesystem based such as iSCSI or NFS). > >> > >> The network storage driver may complete a block request whilst there > >> is a queued network packet retry (because the ack from the remote end > >> races with deciding to queue the retry). The pages for the retried > >> packet would be grant unmapped and the network driver (or hardware) > >> would access the unmapped page. > > > > I thought this had been solved a little while ago by mapping a scratch > > page on unmap even for kernel space grant mappings, but both the design > > doc and here imply not (i.e. the scratch is for user grant mappings > > only), so I must be misremembering. > > It was only for user grant mappings and it did not fix the case where > the page being unmapped was currently dma mapped. This could have > resulted in the NIC transmitting sensitive data. > > e.g., > > 1. iscsi queues a retransmit with page P (frame F). > 2. NIC driver DMA maps and queues frame F on h/w. > 3. iscsi completes the I/O. > 4. page P is unmapped. > 5. response is sent to guest > 6. guest reuses frame F. > 7. NIC transmits frame F. You don't actually need Xen to expose this sort of thing, any userspace which reuses a buffer after write() (to e.g. NFS has completed) might expose the new data on the wire in a retransmit. It's certainly much easier to expose with Xen, I'll grant (no pun intended) you that. > We don't use this safe unmap mechanism for netback because the zero copy > stuff means we don't need it and the polling on the unmap is high > latency and only good enough if the polling is needed very rarely. I'd think it should be rare for netback too unless your network is made of wet string. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On 07/01/15 13:24, Ian Campbell wrote: > On Wed, 2015-01-07 at 13:07 +, David Vrabel wrote: > >> We don't use this safe unmap mechanism for netback because the zero copy >> stuff means we don't need it and the polling on the unmap is high >> latency and only good enough if the polling is needed very rarely. > > I'd think it should be rare for netback too unless your network is made > of wet string. Hmmm. Yes, this might be worth looking at but I would prefer to do it to it as a follow up to this series. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
On Wed, 2015-01-07 at 13:30 +, David Vrabel wrote: > On 07/01/15 13:24, Ian Campbell wrote: > > On Wed, 2015-01-07 at 13:07 +, David Vrabel wrote: > > > >> We don't use this safe unmap mechanism for netback because the zero copy > >> stuff means we don't need it and the polling on the unmap is high > >> latency and only good enough if the polling is needed very rarely. > > > > I'd think it should be rare for netback too unless your network is made > > of wet string. > > Hmmm. Yes, this might be worth looking at but I would prefer to do it > to it as a follow up to this series. Sure. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
On Wed, 2015-01-07 at 13:23 +, David Vrabel wrote: > On 07/01/15 12:11, Ian Campbell wrote: > > On Tue, 2015-01-06 at 18:57 +, David Vrabel wrote: > >> In an x86 PV guest, get_user_pages_fast() on a userspace address range > >> containing foreign mappings does not work correctly because the M2P > >> lookup of the MFN from a userspace PTE may return the wrong page. > >> > >> Force get_user_pages_fast() to fail on such addresses by marking the PTEs > >> as special. > >> > >> If Xen has XENFEAT_gnttab_map_avail_bits (available since at least > >> 4.0), > > > > http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0 > > pvpops already requires >= 4.0 too, which matches my recollection > > (something to do with a new APIC interface which upstream insisted on > > during upstreaming, IIRC), but both could be out of date... > > gntdev is usable by driver domains and useful for inter-domain comms so > it isn't limited to dom0 use only and Linux still needs to run on Xen > 3.2 (I think that's the oldest still available on AWS). Ah yes, driver domains... > Because of the m2p override limitation, gntdev is currently unsafe[1] to > use by untrusted userspace apps so there's no (new) security issues here. > > However, we could disable gntdev if this feature is messing unless > overridden by a module option. Opinions on this? If it is exploitable by untrusted apps in the new form (the race between mmap and the pte update still is, right?), then that might be best, or only allow root to open it? > > David > > [1] mapping a ref twice or a two refs for the same frame can corrupt > kernel state is various exciting ways because of messed up page ref counts. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] check-headers error in staging
On Wed, 2015-01-07 at 13:05 +, Jan Beulich wrote: > >>> On 07.01.15 at 13:45, wrote: > > On 07/01/15 12:37, Olaf Hering wrote: > >> After upgrade to current staging, my test packages fail to build: > >> > >> [ 289s] + make -j4 -k -C tools/include/xen-foreign > >> [ 289s] make: Entering directory > > `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' > >> [ 289s] python mkheader.py arm32 arm32.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/public/arch-arm.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > > /../../xen/include/public/xen.h > >> [ 289s] python mkheader.py arm64 arm64.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/public/arch-arm.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > > /../../xen/include/public/xen.h > >> [ 289s] python mkheader.py x86_32 x86_32.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/public/arch-x86/xen-x86_32.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/ > >> [ 289s] python mkheader.py x86_64 x86_64.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/public/arch-x86/xen-x86_64.h > > /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > > ./../xen/include/ > >> [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 > >> [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer > > -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c > >> [ 289s] ./checker > tmp.size > >> [ 289s] diff -u reference.size tmp.size > >> [ 289s] --- reference.size 2015-01-07 10:28:57.0 + > >> [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + > >> [ 289s] @@ -9,6 +9,6 @@ > >> [ 289s] arch_vcpu_info| 0 0 24 16 > >> [ 289s] vcpu_time_info| 32 32 32 32 > >> [ 289s] vcpu_info | 48 48 64 64 > >> [ 289s] -arch_shared_info | 0 0 268 280 > >> [ 289s] -shared_info |1088108825843368 > >> [ 289s] +arch_shared_info | 0 0 24 48 > >> [ 289s] +shared_info |1088108823403136 > >> [ 289s] > >> [ 289s] make: *** [check-headers] Error 1 > >> > >> > >> this was the last successful build: > >> xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 > >> > >> this build fails: > >> xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 > >> > >> > >> So it looks like something between git commites dcd8486..dd94cac causes > >> this > > failure. > > > > It will be c/s dac6d3b1a "expand x86 arch_shared_info to support linear > > p2m list" > > > > There is a shrink to the size of arch_shared_info, but was argued and > > accepted as safe to do. > > So it looks like I should revert that one then, as it'll cause an > unconditional build failure in the tools part of the build afaict. Plus > it's not clear how to properly express the now variable size to the > checking logic, i.e. resolving the issue may take some time. It's not sufficient just to reinstate the appropriate amount of padding? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] check-headers error in staging
On 07/01/15 13:36, Ian Campbell wrote: > On Wed, 2015-01-07 at 13:05 +, Jan Beulich wrote: > On 07.01.15 at 13:45, wrote: >>> On 07/01/15 12:37, Olaf Hering wrote: After upgrade to current staging, my test packages fail to build: [ 289s] + make -j4 -k -C tools/include/xen-foreign [ 289s] make: Entering directory >>> `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' [ 289s] python mkheader.py arm32 arm32.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/public/arch-arm.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. >>> /../../xen/include/public/xen.h [ 289s] python mkheader.py arm64 arm64.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/public/arch-arm.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. >>> /../../xen/include/public/xen.h [ 289s] python mkheader.py x86_32 x86_32.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/public/arch-x86/xen-x86_32.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/ [ 289s] python mkheader.py x86_64 x86_64.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/public/arch-x86/xen-x86_64.h >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. >>> ./../xen/include/ [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer >>> -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c [ 289s] ./checker > tmp.size [ 289s] diff -u reference.size tmp.size [ 289s] --- reference.size 2015-01-07 10:28:57.0 + [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + [ 289s] @@ -9,6 +9,6 @@ [ 289s] arch_vcpu_info| 0 0 24 16 [ 289s] vcpu_time_info| 32 32 32 32 [ 289s] vcpu_info | 48 48 64 64 [ 289s] -arch_shared_info | 0 0 268 280 [ 289s] -shared_info |1088108825843368 [ 289s] +arch_shared_info | 0 0 24 48 [ 289s] +shared_info |1088108823403136 [ 289s] [ 289s] make: *** [check-headers] Error 1 this was the last successful build: xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 this build fails: xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 So it looks like something between git commites dcd8486..dd94cac causes this >>> failure. >>> >>> It will be c/s dac6d3b1a "expand x86 arch_shared_info to support linear >>> p2m list" >>> >>> There is a shrink to the size of arch_shared_info, but was argued and >>> accepted as safe to do. >> So it looks like I should revert that one then, as it'll cause an >> unconditional build failure in the tools part of the build afaict. Plus >> it's not clear how to properly express the now variable size to the >> checking logic, i.e. resolving the issue may take some time. > It's not sufficient just to reinstate the appropriate amount of padding? > > The padding was previously in terms of uint64_t's, whereas the new amount of padding is $N - (3 * unsigned long) which is arch dependent and awkward to cleanly express. We discussed that, as it was the last field all zeroes anyway, with the backing page also being zeroed, dropping the padding in its entirety was safe. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] check-headers error in staging
On Wed, 2015-01-07 at 13:39 +, Andrew Cooper wrote: > On 07/01/15 13:36, Ian Campbell wrote: > > On Wed, 2015-01-07 at 13:05 +, Jan Beulich wrote: > > On 07.01.15 at 13:45, wrote: > >>> On 07/01/15 12:37, Olaf Hering wrote: > After upgrade to current staging, my test packages fail to build: > > [ 289s] + make -j4 -k -C tools/include/xen-foreign > [ 289s] make: Entering directory > >>> `/usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign' > [ 289s] python mkheader.py arm32 arm32.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/public/arch-arm.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > >>> /../../xen/include/public/xen.h > [ 289s] python mkheader.py arm64 arm64.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/public/arch-arm.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/.. > >>> /../../xen/include/public/xen.h > [ 289s] python mkheader.py x86_32 x86_32.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/public/arch-x86/xen-x86_32.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/ > [ 289s] python mkheader.py x86_64 x86_64.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/public/arch-x86/xen-x86_64.h > >>> /usr/src/packages/BUILD/xen-4.6.0.30084/non-dbg/tools/include/xen-foreign/../. > >>> ./../xen/include/ > [ 289s] python mkchecker.py checker.c arm32 arm64 x86_32 x86_64 > [ 289s] gcc -Wall -Werror -Wstrict-prototypes -O2 -fomit-frame-pointer > >>> -fno-strict-aliasing -Wdeclaration-after-statement -o checker checker.c > [ 289s] ./checker > tmp.size > [ 289s] diff -u reference.size tmp.size > [ 289s] --- reference.size 2015-01-07 10:28:57.0 + > [ 289s] +++ tmp.size 2015-01-07 12:28:33.564911299 + > [ 289s] @@ -9,6 +9,6 @@ > [ 289s] arch_vcpu_info| 0 0 24 16 > [ 289s] vcpu_time_info| 32 32 32 32 > [ 289s] vcpu_info | 48 48 64 64 > [ 289s] -arch_shared_info | 0 0 268 280 > [ 289s] -shared_info |1088108825843368 > [ 289s] +arch_shared_info | 0 0 24 48 > [ 289s] +shared_info |1088108823403136 > [ 289s] > [ 289s] make: *** [check-headers] Error 1 > > > this was the last successful build: > xen_hg_changeset Mon Dec 15 17:40:12 2014 + hg: 30046:cefc36150538 > > this build fails: > xen_hg_changeset Wed Jan 07 11:28:57 2015 +0100 hg: 30084:88d114af72d8 > > > So it looks like something between git commites dcd8486..dd94cac causes > this > >>> failure. > >>> > >>> It will be c/s dac6d3b1a "expand x86 arch_shared_info to support linear > >>> p2m list" > >>> > >>> There is a shrink to the size of arch_shared_info, but was argued and > >>> accepted as safe to do. > >> So it looks like I should revert that one then, as it'll cause an > >> unconditional build failure in the tools part of the build afaict. Plus > >> it's not clear how to properly express the now variable size to the > >> checking logic, i.e. resolving the issue may take some time. > > It's not sufficient just to reinstate the appropriate amount of padding? > > > > > > The padding was previously in terms of uint64_t's, whereas the new > amount of padding is $N - (3 * unsigned long) which is arch dependent > and awkward to cleanly express. We discussed that, as it was the last > field all zeroes anyway, with the backing page also being zeroed, > dropping the padding in its entirety was safe. I think this could be fixed by updating reference.size, it already has separate columns for x86_32 and x86_64 (I'd originally thought Jan meant variable as in dynamic, rather than just different between subarches). In fact they appear to have been different in size before anyway, they are just differently different now... Otherwise either ifdef the padding or make the new fields uint64_t, neither seems all that intolerable. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.10 test] 33190: regressions - trouble: blocked/broken/fail/pass
flight 33190 linux-3.10 real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/33190/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail REGR. vs. 26303 build-armhf-libvirt 5 libvirt-buildfail in 33120 REGR. vs. 26303 build-i386-libvirt5 libvirt-buildfail in 33120 REGR. vs. 26303 build-amd64-libvirt 5 libvirt-buildfail in 33120 REGR. vs. 26303 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl 3 host-install(3) broken pass in 33120 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 11 rumpuserxen-demo-xenstorels/xenstorels fail blocked in 26303 build-armhf-libvirt 3 host-install(3)broken blocked in 26303 build-i386-libvirt3 host-install(3)broken blocked in 26303 build-amd64-libvirt 3 host-install(3)broken blocked in 26303 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail blocked in 26303 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 26303 test-amd64-amd64-xl-winxpsp3 7 windows-install fail like 26303 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-armhf-armhf-xl 5 xen-boot fail in 33120 never pass version targeted for testing: linuxa472efc75989c7092187fe00f0400e02c495c436 baseline version: linuxbe67db109090b17b56eb8eb2190cd70700f107aa 816 people touched revisions under test, not listing them all jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt broken build-armhf-libvirt broken build-i386-libvirt broken build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl broken test-amd64-i386-xl pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64fail test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64
[Xen-devel] [PATCH] x86/xen: Free bootmem in free_p2m_page() during early boot
With recent changes in p2m we now have legitimate cases when p2m memory needs to be freed during early boot (i.e. before slab is initialized). Signed-off-by: Boris Ostrovsky --- arch/x86/xen/p2m.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index edbc7a6..df8acb4 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -167,10 +167,13 @@ static void * __ref alloc_p2m_page(void) return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT); } -/* Only to be called in case of a race for a page just allocated! */ static void free_p2m_page(void *p) { - BUG_ON(!slab_is_available()); + if (unlikely(!slab_is_available())) { + free_bootmem((unsigned long)p, PAGE_SIZE); + return; + } + free_page((unsigned long)p); } -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
On 01/07/2015 04:04 AM, Dario Faggioli wrote: On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote: .. and use this new interface to display it along with CPU topology and NUMA information when 'xl info -n' command is issued Also, can we see how an `xl info -n' looks, on an IONUMA system? @@ -195,6 +195,24 @@ int xc_cputopoinfo(xc_interface *xch, return 0; } +int xc_pcitopoinfo(xc_interface *xch, + xc_pcitopoinfo_t *put_info) +{ +int ret; +DECLARE_SYSCTL; + +sysctl.cmd = XEN_SYSCTL_pcitopoinfo; + +memcpy(&sysctl.u.pcitopoinfo, put_info, sizeof(*put_info)); + +if ( (ret = do_sysctl(xch, &sysctl)) != 0 ) +return ret; + +memcpy(put_info, &sysctl.u.pcitopoinfo, sizeof(*put_info)); + +return 0; +} @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) return ret; } +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs) +{ +GC_INIT(ctx); +xc_pcitopoinfo_t tinfo; +DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo); +libxl_pcitopology *ret = NULL; +int i, rc; + I see from where this comes from. However, at least from new functions, I think we should avoid using things like DECLARE_HYPERCALL_BUFFER etc., in libxl. They belong in libxc, IMO. This is basically what Andrew was doing here (although that was on xc_{topology,numa}info: http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of the work, with libxl_get_pci_topology being just a wrapper to it. Maybe then I should update libxl_get_cpu_topology() and libxl_get_numainfo() as well for code uniformity. -boris In fact, if you grep/cscope for things like DECLARE_HYPERCALL_BUFFER in tools/libxl, the *only* results are these: libxl.c libxl_get_cpu_topology 5076 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap); libxl.c libxl_get_cpu_topology 5077 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap); libxl.c libxl_get_cpu_topology 5078 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap); libxl.c libxl_get_numainfo 5142 DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); libxl.c libxl_get_numainfo 5143 DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); libxl.c libxl_get_numainfo 5144 DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); And I think we should work toward removing these too, rather than adding new ones! :-) Regards, Dario ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/xen: Free bootmem in free_p2m_page() during early boot
On 01/07/2015 03:08 PM, Boris Ostrovsky wrote: With recent changes in p2m we now have legitimate cases when p2m memory needs to be freed during early boot (i.e. before slab is initialized). Signed-off-by: Boris Ostrovsky Reviewed-by: Juergen Gross --- arch/x86/xen/p2m.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index edbc7a6..df8acb4 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -167,10 +167,13 @@ static void * __ref alloc_p2m_page(void) return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT); } -/* Only to be called in case of a race for a page just allocated! */ static void free_p2m_page(void *p) { - BUG_ON(!slab_is_available()); + if (unlikely(!slab_is_available())) { + free_bootmem((unsigned long)p, PAGE_SIZE); + return; + } + free_page((unsigned long)p); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 01/07/2015 04:06 AM, Jan Beulich wrote: On 06.01.15 at 03:18, wrote: @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } else pdev_info.is_virtfn = 0; -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); + +if ( add.flags & XEN_PCI_DEV_PXM ) +{ +uint32_t pxm; +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) / unsigned int or size_t. --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -56,6 +56,8 @@ struct pci_dev { u8 phantom_stride; +int node; /* NUMA node */ I thought I asked about this on v1 already: Does this really need to be an int, when commonly node numbers are stored in u8/unsigned char? Shrinking the field size would prevent the structure size from growing... I kept this field as an int to be able to store NUMA_NO_NODE which I thought to be (int)-1. But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to (int)-1 by pxm_to_node(). Given that there is a number of tests for NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() return u8 as well? Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... Since we have CPU topology in common code I thought this would be arch-independent as well. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
On 01/07/2015 04:12 AM, Jan Beulich wrote: On 06.01.15 at 14:41, wrote: On 06/01/15 02:18, Boris Ostrovsky wrote: Instead of copying data for each field in xen_sysctl_topologyinfo separately put cpu/socket/node into a single structure and do a single copy for each processor. There is also no need to copy whole op to user at the end, max_cpu_index is sufficient Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact that these are used for CPU topology. Subsequent patch will add support for PCI topology sysctl. Signed-off-by: Boris Ostrovsky If we are going to change the hypercall, then can we see about making it a stable interface (i.e. not a sysctl/domctl)? There are non-toolstack components which might want/need access to this information. (i.e. I am still looking for a reasonable way to get this information from Xen in hwloc) In which case leaving the sysctl alone and just adding a new non-sysctl interface should be considered. I'd expect IO NUMA information to be used together with CPU topology information so I am not sure how useful this would be. Unless we create a similar interface for that (CPU/memory) as well. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
On Tue, Jan 06, 2015 at 11:18:21PM -0800, Andy Lutomirski wrote: > On Tue, Jan 6, 2015 at 9:38 PM, Paolo Bonzini wrote: > > > > > > On 06/01/2015 17:56, Andy Lutomirski wrote: > >> Still no good. We can migrate a bunch of times so we see the same CPU > >> all three times > > > > There are no three times. The CPU you see here: > > > >>> > >>> > >>> // ... compute nanoseconds from pvti and tsc ... > >>> rmb(); > >>> } while(v != pvti->version); > > > > is the same you read here: > > > >>> cpu = get_cpu(); > > > > The algorithm is: > > I still don't see why this is safe, and I think that the issue is that > you left out part of the loop. > > > > > 1) get a consistent (cpu, version, tsc) > > > >1.a) get cpu > > Suppose we observe cpu 0. > > >1.b) get pvti[cpu]->version, ignoring low bit > > Missing step, presumably here: read pvti[cpu]->tsc_timestamp, scale, > etc. This could all execute on vCPU 1. We could read values that are > inconsistent with each other. > > >1.c) get (tsc, cpu) > > Now we could end up back on vCPU 0. > > >1.d) if cpu from 1.a and 1.c do not match, loop > >1.e) if pvti[cpu] was being updated, we'll loop later > > > > 2) compute nanoseconds from pvti[cpu] and tsc > > > > 3) if pvti[cpu] changed under our feet during (2), i.e. version doesn't > > match, retry. > > > > As long as the CPU is consistent between get_cpu() and rdtscp(), there > > is no problem with migration, because pvti is always accessed for that > > one CPU. If there were any problem, it would be caught by the version > > check. Writing it down with two nested do...whiles makes it clearer IMHO. > > Why exactly would it be caught by the version check? > > My ugly patch tries to make the argument that, at any point at which > we observe ourselves to be on a given vCPU, that vCPU won't be > updating pvti. That means that, if version doesn't change between two > consecutive observations that we're on that vCPU, then we're okay. > This IMO sucks. It's fragile, it's hard to make a coherent argument > about correctness, and it requires at least two getcpu-like operations > to read the time. Those operations are *slow*. One is much better > than two, and zero is much better than one. > > > > >> and *still* don't get a consistent read, unless we > >> play nasty games with lots of version checks (I have a patch for that, > >> but I don't like it very much). The patch is here: > >> > >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=a69754dc5ff33f5187162b5338854ad23dd7be8d > >> > >> but I don't like it. > >> > >> Thus far, I've been told unambiguously that a guest can't observe pvti > >> while it's being written, and I think you're now telling me that this > >> isn't true and that a guest *can* observe pvti while it's being > >> written while the low bit of the version field is not set. If so, > >> this is rather strongly incompatible with the spec in the KVM docs. > > > > Where am I saying that? > > I thought the conclusion from what you and Marcelo pointed out about > the code was that, once the first vCPU updated its pvti, it could > start running guest code while the other vCPUs are still updating > pvti, so its guest code can observe the other vCPUs mid-update. > > >> Also, if you do this, can you also make setting and clearing > >> STABLE_BIT properly atomic across all vCPUs? Or at least do something > >> like setting it last and clearing it first on vPCU 0? > > > > That would be nice if you want to make the pvclock area fit in a single > > page. However, it would have to be a separate flag bit, or a separate > > CPUID feature. > > It would be nice to have. Although I think that fixing the host to > increment version pre-update and post-update may actually be good > enough. Is there any case in which it would fail in practice if we > made that fix and always looked at pvti 0? TSC_STABLE_BIT -> ~TSC_STABLE_BIT transition steps would finish but still allow VCPU-1 to use stale values from VCPU-0. To fix, do one of the following: 1) Check validity of local TSC_STABLE_BIT in addition (slow). 2) Perform update of VCPU-0 pvclock area before allowing any other VCPU to VM-entry. > > --Andy > > > > > Paolo > > > > -- > Andy Lutomirski > AMA Capital Management, LLC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
On Wed, 2015-01-07 at 09:15 -0500, Boris Ostrovsky wrote: > On 01/07/2015 04:04 AM, Dario Faggioli wrote: > > This is basically what Andrew was doing here (although that was on > > xc_{topology,numa}info: > > > > http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental > > http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b > > > > Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of > > the work, with libxl_get_pci_topology being just a wrapper to it. > > > Maybe then I should update libxl_get_cpu_topology() and > libxl_get_numainfo() as well for code uniformity. > Yes, that would be ideal. I wanted to mention it, but didn't, because I felt it was not fair to "ask" you that.. but yes, since you're reworking them anyway, that really would IMO be wonderful. :-) Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCHv1 0/3] x86/xen: fixes and improvements to linear p2m
The linear p2m changes in 3.19-rc1 is broken with some dom0 configurations. While trying to fix it I also noticed that get_phys_to_machine() could be optimized for the common case. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/3] x86/xen: optimize get_phys_to_machine()
The page table walk is only needed to distinguish between identity and missing, both of which have INVALID_P2M_ENTRY. Signed-off-by: David Vrabel --- arch/x86/xen/p2m.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index edbc7a6..a848201 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -405,8 +405,7 @@ void __init xen_vmalloc_p2m_tree(void) unsigned long get_phys_to_machine(unsigned long pfn) { - pte_t *ptep; - unsigned int level; + unsigned long mfn; if (unlikely(pfn >= xen_p2m_size)) { if (pfn < xen_max_p2m_pfn) @@ -414,19 +413,26 @@ unsigned long get_phys_to_machine(unsigned long pfn) return IDENTITY_FRAME(pfn); } + + mfn = xen_p2m_addr[pfn]; - ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level); - BUG_ON(!ptep || level != PG_LEVEL_4K); + if (unlikely(mfn == INVALID_P2M_ENTRY)) { + pte_t *ptep; + unsigned int level; - /* -* The INVALID_P2M_ENTRY is filled in both p2m_*identity -* and in p2m_*missing, so returning the INVALID_P2M_ENTRY -* would be wrong. -*/ - if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity))) - return IDENTITY_FRAME(pfn); + ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level); + BUG_ON(!ptep || level != PG_LEVEL_4K); + + /* +* The INVALID_P2M_ENTRY is filled in both p2m_*identity +* and in p2m_*missing, so returning the INVALID_P2M_ENTRY +* would be wrong. +*/ + if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity))) + return IDENTITY_FRAME(pfn); + } - return xen_p2m_addr[pfn]; + return mfn; } EXPORT_SYMBOL_GPL(get_phys_to_machine); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 07/01/15 14:42, Boris Ostrovsky wrote: > On 01/07/2015 04:06 AM, Jan Beulich wrote: > On 06.01.15 at 03:18, wrote: >>> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, >>> XEN_GUEST_HANDLE_PARAM(void) arg) >>> } >>> else >>> pdev_info.is_virtfn = 0; >>> -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); >>> + >>> +if ( add.flags & XEN_PCI_DEV_PXM ) >>> +{ >>> +uint32_t pxm; >>> +int optarr_off = offsetof(struct >>> physdev_pci_device_add, optarr) / >> unsigned int or size_t. >> >>> --- a/xen/include/xen/pci.h >>> +++ b/xen/include/xen/pci.h >>> @@ -56,6 +56,8 @@ struct pci_dev { >>> u8 phantom_stride; >>> +int node; /* NUMA node */ >> I thought I asked about this on v1 already: Does this really need to be >> an int, when commonly node numbers are stored in u8/unsigned char? >> Shrinking the field size would prevent the structure size from >> growing... > > > I kept this field as an int to be able to store NUMA_NO_NODE which I > thought to be (int)-1. > > But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to > (int)-1 by pxm_to_node(). Given that there is a number of tests for > NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() > return u8 as well? I noticed this as well, and found it quite counter intuitive. I would suggest fixing NUMA_NO_NODE to -1 and removing some of the type-punning. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] x86/xen: add extra memory for remapped frames during setup
If the non-RAM regions in the e820 memory map are larger than the size of the initial balloon, a BUG was triggered as the frames are remaped beyond the limit of the linear p2m. The frames are remapped into the initial balloon area (xen_extra_mem) but not enough of this is available. Ensure enough extra memory regions are added for these remapped frames. Signed-off-by: David Vrabel --- arch/x86/xen/setup.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 664dffc..feb6d86 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -366,7 +366,7 @@ static void __init xen_do_set_identity_and_remap_chunk( static unsigned long __init xen_set_identity_and_remap_chunk( const struct e820entry *list, size_t map_size, unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, unsigned long remap_pfn, - unsigned long *released) + unsigned long *released, unsigned long *remapped) { unsigned long pfn; unsigned long i = 0; @@ -404,6 +404,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Update variables to reflect new mappings. */ i += size; remap_pfn += size; + *remapped += size; } /* @@ -420,12 +421,13 @@ static unsigned long __init xen_set_identity_and_remap_chunk( static void __init xen_set_identity_and_remap( const struct e820entry *list, size_t map_size, unsigned long nr_pages, - unsigned long *released) + unsigned long *released, unsigned long *remapped) { phys_addr_t start = 0; unsigned long last_pfn = nr_pages; const struct e820entry *entry; unsigned long num_released = 0; + unsigned long num_remapped = 0; int i; /* @@ -452,12 +454,13 @@ static void __init xen_set_identity_and_remap( last_pfn = xen_set_identity_and_remap_chunk( list, map_size, start_pfn, end_pfn, nr_pages, last_pfn, - &num_released); + &num_released, &num_remapped); start = end; } } *released = num_released; + *remapped = num_remapped; pr_info("Released %ld page(s)\n", num_released); } @@ -577,6 +580,7 @@ char * __init xen_memory_setup(void) struct xen_memory_map memmap; unsigned long max_pages; unsigned long extra_pages = 0; + unsigned long remapped_pages; int i; int op; @@ -626,9 +630,10 @@ char * __init xen_memory_setup(void) * underlying RAM. */ xen_set_identity_and_remap(map, memmap.nr_entries, max_pfn, - &xen_released_pages); + &xen_released_pages, &remapped_pages); extra_pages += xen_released_pages; + extra_pages += remapped_pages; /* * Clamp the amount of extra memory to a EXTRA_MEM_RATIO -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/3] x86/xen: don't count how many PFNs are identity mapped
This accounting is just used to print a diagnostic message that isn't very useful. Signed-off-by: David Vrabel --- arch/x86/xen/setup.c | 27 +-- 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index dfd77de..664dffc 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -229,15 +229,14 @@ static int __init xen_free_mfn(unsigned long mfn) * as a fallback if the remapping fails. */ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, - unsigned long end_pfn, unsigned long nr_pages, unsigned long *identity, - unsigned long *released) + unsigned long end_pfn, unsigned long nr_pages, unsigned long *released) { - unsigned long len = 0; unsigned long pfn, end; int ret; WARN_ON(start_pfn > end_pfn); + /* Release pages first. */ end = min(end_pfn, nr_pages); for (pfn = start_pfn; pfn < end; pfn++) { unsigned long mfn = pfn_to_mfn(pfn); @@ -250,16 +249,14 @@ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, WARN(ret != 1, "Failed to release pfn %lx err=%d\n", pfn, ret); if (ret == 1) { + (*released)++; if (!__set_phys_to_machine(pfn, INVALID_P2M_ENTRY)) break; - len++; } else break; } - /* Need to release pages first */ - *released += len; - *identity += set_phys_range_identity(start_pfn, end_pfn); + set_phys_range_identity(start_pfn, end_pfn); } /* @@ -318,7 +315,6 @@ static void __init xen_do_set_identity_and_remap_chunk( unsigned long ident_pfn_iter, remap_pfn_iter; unsigned long ident_end_pfn = start_pfn + size; unsigned long left = size; - unsigned long ident_cnt = 0; unsigned int i, chunk; WARN_ON(size == 0); @@ -347,8 +343,7 @@ static void __init xen_do_set_identity_and_remap_chunk( xen_remap_mfn = mfn; /* Set identity map */ - ident_cnt += set_phys_range_identity(ident_pfn_iter, - ident_pfn_iter + chunk); + set_phys_range_identity(ident_pfn_iter, ident_pfn_iter + chunk); left -= chunk; } @@ -371,7 +366,7 @@ static void __init xen_do_set_identity_and_remap_chunk( static unsigned long __init xen_set_identity_and_remap_chunk( const struct e820entry *list, size_t map_size, unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, unsigned long remap_pfn, - unsigned long *identity, unsigned long *released) + unsigned long *released) { unsigned long pfn; unsigned long i = 0; @@ -386,8 +381,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Do not remap pages beyond the current allocation */ if (cur_pfn >= nr_pages) { /* Identity map remaining pages */ - *identity += set_phys_range_identity(cur_pfn, - cur_pfn + size); + set_phys_range_identity(cur_pfn, cur_pfn + size); break; } if (cur_pfn + size > nr_pages) @@ -398,7 +392,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( if (!remap_range_size) { pr_warning("Unable to find available pfn range, not remapping identity pages\n"); xen_set_identity_and_release_chunk(cur_pfn, - cur_pfn + left, nr_pages, identity, released); + cur_pfn + left, nr_pages, released); break; } /* Adjust size to fit in current e820 RAM region */ @@ -410,7 +404,6 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Update variables to reflect new mappings. */ i += size; remap_pfn += size; - *identity += size; } /* @@ -430,7 +423,6 @@ static void __init xen_set_identity_and_remap( unsigned long *released) { phys_addr_t start = 0; - unsigned long identity = 0; unsigned long last_pfn = nr_pages; const struct e820entry *entry; unsigned long num_released = 0; @@ -460,14 +452,13 @@ static void __init xen_set_identity_and_remap( last_pfn = xen_set_identity_and_remap_chunk( list, map_size, start_pfn, end_pfn, nr_pages, last_pfn, - &identity, &num_released); + &num_released);
Re: [Xen-devel] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c
On Wed, Jan 07, 2015 at 02:13:49PM +0800, Jiang Liu wrote: > Commit b81975eade8c ("x86, irq: Clean up irqdomain transition code") > breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke > setup_IO_APIC(), so no irqdomains created for IOAPICs and > mp_map_pin_to_irq() fails at the very beginning. > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -2369,31 +2369,29 @@ static void ioapic_destroy_irqdomain(int idx) > ioapics[idx].pin_info = NULL; > } > > -void __init setup_IO_APIC(void) > +void __init setup_IO_APIC(bool xen_smp) > { > int ioapic; > > - /* > - * calling enable_IO_APIC() is moved to setup_local_APIC for BP > - */ > - io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL; > + if (!xen_smp) { > + apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); > + io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL; > + > + /* Set up IO-APIC IRQ routing. */ > + x86_init.mpparse.setup_ioapic_ids(); > + sync_Arb_IDs(); > + } Is there a specific reason that this cannot run in all cases? What I am asking is why are we doing a special case here? The description at the top implied that we were just missing an call to setup_IO_APIC.. > > - apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); > for_each_ioapic(ioapic) > BUG_ON(mp_irqdomain_create(ioapic)); > - > - /* > - * Set up IO-APIC IRQ routing. > - */ > - x86_init.mpparse.setup_ioapic_ids(); > - > - sync_Arb_IDs(); > setup_IO_APIC_irqs(); > - init_IO_APIC_traps(); > - if (nr_legacy_irqs()) > - check_timer(); > - > ioapic_initialized = 1; > + > + if (!xen_smp) { > + init_IO_APIC_traps(); > + if (nr_legacy_irqs()) > + check_timer(); > + } > } > > /* > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > index 4c071aeb8417..7eb0283901fa 100644 > --- a/arch/x86/xen/smp.c > +++ b/arch/x86/xen/smp.c > @@ -326,7 +326,10 @@ static void __init xen_smp_prepare_cpus(unsigned int > max_cpus) > > xen_raw_printk(m); > panic(m); > + } else { > + setup_IO_APIC(true); > } > + > xen_init_lock_cpu(0); > > smp_store_boot_cpu_info(); > -- > 1.7.10.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/7] tools/hotplug: remove SELinux options from var-lib-xenstored.mount
On Wed, Jan 07, 2015 at 09:31:50AM +, Ian Campbell wrote: > On Wed, 2015-01-07 at 10:23 +0100, Olaf Hering wrote: > > On Tue, Jan 06, Ian Campbell wrote: > > > > > On Fri, 2014-12-19 at 12:25 +0100, Olaf Hering wrote: > > > > ... > > > > > Acked-by: Ian Campbell > > > > > > (on commit s/Appearently/Apparently/; s/non-existant/non-existent/ in > > > the commit log) > > > > I made typos also in other commit messages. Should I resend the entire > > series, or will this be done during commit? > > Looks like Konrad already committed, I don't know if he fixed the typos > (I suppose it doesn't matter now either way). I did the changes you pointed our here. > > Ian. > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
On 01/07/2015 04:21 AM, Jan Beulich wrote: On 06.01.15 at 03:18, wrote: --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -365,6 +365,66 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) } break; #ifdef HAS_PCI +case XEN_SYSCTL_pcitopoinfo: +{ +xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo; + +if ( guest_handle_is_null(ti->pcitopo) || + (ti->first_dev >= ti->num_devs) ) +{ +ret = -EINVAL; +break; +} + +for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ ) +{ +xen_sysctl_pcitopo_t pcitopo; +struct pci_dev *pdev; + +if ( copy_from_guest_offset(&pcitopo, ti->pcitopo, +ti->first_dev, 1) ) +{ +ret = -EFAULT; +break; +} + +spin_lock(&pcidevs_lock); +pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus, +pcitopo.pcidev.devfn); +if ( !pdev || (pdev->node == NUMA_NO_NODE) ) +pcitopo.node = INVALID_TOPOLOGY_ID; +else +pcitopo.node = pdev->node; Are hypervisor-internal node numbers really meaningful to the caller? This is the same information (pxm -> node mapping ) that we provide in XEN_SYSCTL_topologyinfo (renamed in this series to XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be used together I think the answer is yes. @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t); /* XEN_SYSCTL_cputopoinfo */ -#define INVALID_TOPOLOGY_ID (~0U) +#define INVALID_TOPOLOGY_ID (~0U) /* Also used by pcitopo */ Better extend the preceding comment. I mentioned it to Wei yesterday that the file is structured (or at least appears to me to be structured) in such a way that these top comments mark sections of definitions for each sysctl. And so I thought that I'd be breaking this convention if I were to extend the comment. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] tools/hotplug: add wrapper to start xenstored
On Wed, Jan 07, 2015 at 10:49:38AM +0100, Olaf Hering wrote: > On Tue, Jan 06, Ian Jackson wrote: > > > Olaf Hering writes ("[PATCH 7/7] tools/hotplug: add wrapper to start > > xenstored"): > > > The shell wrapper in xenstored.service does not handle XENSTORE_TRACE. > > ... > > > +XENSTORED_LIBEXEC = xenstored.sh > > > > Should be in /etc as previously discussed. Previously I wrote: > > > > Bottom line: as relevant maintainer, I'm afraid I'm going to insist > > that this script be in /etc. > > > > I'm disappointed. It is not acceptable to resubmit a change ignoring > > such unequivocal feedback. > > Plain "/etc" wont work, I think. /etc/xen/scripts perhaps? But see my > other reply to IanC, maybe there is a way to avoid the wrapper. > > And after having some time to think about this: If one has a need to > adjust something, then this could be done in the xencommons script right > away. In other words, the modification can be done there instead of > calling the wrapper. > > > Nacked-by: Ian Jackson > > > > > +hotplug/Linux/xenstored.sh > > > > Although many of the existing hotplug scripts have this notion of > > calling things "foo.sh" because they happen to be written in shell, I > > think this is bad practice. > > > > I would prefer xenstored-wrap or some such. (My co-maintainers may > > disagree...) But this is a bit of a bikeshed issue. > > I agree. Initally I had xenstored-launcher in mind. > > > > echo -n Starting $XENSTORED... > > > - $XENSTORED --pid-file /var/run/xenstored.pid $XENSTORED_ARGS > > > + XENSTORED=$XENSTORED \ > > > + XENSTORED_TRACE=$XENSTORED_TRACE \ > > > + XENSTORED_ARGS=$XENSTORED_ARGS \ > > > + ${LIBEXEC_BIN}/xenstored.sh --pid-file > > > /var/run/xenstored.pid > > > > It might be easier to "." xenstore-wrap. Failing that using `export' > > will avoid this rather odd and repetitive style. > > I think thats a good idea. Something like this may work, doing the "." > and the "exec" in the subshell: > > ( > set -- --pid-file /var/run/xenstored.pid > . xenstored.sh > ) > > > > > diff --git a/tools/hotplug/Linux/xenstored.sh.in > > > b/tools/hotplug/Linux/xenstored.sh.in > > > new file mode 100644 > > > index 000..dc806ee > > > --- /dev/null > > > +++ b/tools/hotplug/Linux/xenstored.sh.in > > > @@ -0,0 +1,6 @@ > > > +#!/bin/sh > > > +if test -n "$XENSTORED_TRACE" > > > +then > > > + XENSTORED_ARGS=" -T /var/log/xen/xenstored-trace.log" > > > +fi > > > +exec $XENSTORED $@ $XENSTORED_ARGS > > > > This should probably have "" around "$@" just in case. > > Ok. > > > I will wait for results from SELinux testing before respinning this > patch. It did work for me (I did an 'Tested-by') in my email. Please do keep in mind that today is the last for commits for Xen 4.5. No pressure :-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/7 v3] tools/hotplug: systemd changes for 4.5
On Wed, Jan 07, 2015 at 10:53:06AM +0100, Olaf Hering wrote: > On Mon, Jan 05, Konrad Rzeszutek Wilk wrote: > > > +Release Issues > > +== > > + > > +While we did the utmost to get a release out, there are certain > > +fixes which were not complete on time. As such please reference this > > +section if you are running into trouble. > > + > > +* systemd not working with Fedora Core 20, 21 or later (systemctl > > + reports xenstore failing to start). > > + > > + Systemd support is now part of Xen source code. While utmost work has > > + been done to make the systemd files compatible across all the > > + distributions, there might issues when using systemd files from > > + Xen sources. The work-around is to define an mount entry in > > + /etc/fstab as follow: > > + > > + tmpfs /var/lib/xenstored tmpfs > > + mode=755,context="system_u:object_r:xenstored_var_lib_t:s0" 0 0 > > + > > + > > Shouldnt this go into a new SELinux section in the INSTALL file? It is going in the web-page for 'Release Issues' and such. > > Its my understanding that the reported SELinux failure is not only > related to the context= mount option, but also to the socket passing > from systemd. I couldn't spot any errors in SELinux for this. Perhaps I had misconfigured? > > > Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: don't crash when PTE bits 52 and up are non-zero
On Wed, Jan 07, 2015 at 10:15:39AM +, Jan Beulich wrote: > >>> On 23.12.14 at 07:52, wrote: > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Friday, December 19, 2014 7:26 PM > >> > >> This can (and will) be legitimately the case when sharing page tables > >> with EPT (more of a problem before p2m_access_rwx became zero, but > >> still possible even now when other than that is the default for a > >> guest), leading to an unconditional crash (in print_vtd_entries()) > >> when a DMA remapping fault occurs. > > > > could you elaborate the scenarios when bits 52+ are non-zero? > > > >> > >> Signed-off-by: Jan Beulich > > > > but the changes looks reasonable to me. > > > > Signed-off-by: Kevin Tian > > I translated this to a Reviewed-by, as S-o-b doesn't seem to make > sense here. > > Konrad - please indicate whether this can also go into 4.5. Yes. Release-Acked-by: Konrad Rzeszutek Wilk > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/7 v3] tools/hotplug: systemd changes for 4.5
On Wed, Jan 07, Konrad Rzeszutek Wilk wrote: > On Wed, Jan 07, 2015 at 10:53:06AM +0100, Olaf Hering wrote: > > Its my understanding that the reported SELinux failure is not only > > related to the context= mount option, but also to the socket passing > > from systemd. > > I couldn't spot any errors in SELinux for this. Perhaps I had misconfigured? Last year you said xenstored did not start, even with patch #1 applied. I dont know if you added the required fstab changes. So if current staging works fine with SELinux enabled we could go with this change for the service file, instead of the wrapper: ExecStart=/usr/bin/env $XENSTORED --no-fork $XENSTORED_ARGS Does that work for you? If yes, lets get rid of the XENSTORED_TRACE= boolean and use a new XENSTORED_ARGS= variable instead. That would make patch #7 alot simpler. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 15:42, wrote: > On 01/07/2015 04:06 AM, Jan Beulich wrote: > On 06.01.15 at 03:18, wrote: >>> --- a/xen/include/xen/pci.h >>> +++ b/xen/include/xen/pci.h >>> @@ -56,6 +56,8 @@ struct pci_dev { >>> >>> u8 phantom_stride; >>> >>> +int node; /* NUMA node */ >> I thought I asked about this on v1 already: Does this really need to be >> an int, when commonly node numbers are stored in u8/unsigned char? >> Shrinking the field size would prevent the structure size from growing... > > I kept this field as an int to be able to store NUMA_NO_NODE which I > thought to be (int)-1. > > But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to > (int)-1 by pxm_to_node(). Given that there is a number of tests for > NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() > return u8 as well? I think that would make sense, together with fixing up one of the three callers in VT-d code (from alloc_pgtable_maddr()); the other two look correct already. >> Of course an additional question would be whether the node wouldn't >> better go into struct arch_pci_dev - that depends on whether we >> expect ARM to be using NUMA... > > Since we have CPU topology in common code I thought this would be > arch-independent as well. Not sure what you're referring to here: What common piece of data stores the node of a particular CPU? cpu_to_node[] clearly is x86- specific. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 15:47, wrote: > On 07/01/15 14:42, Boris Ostrovsky wrote: >> I kept this field as an int to be able to store NUMA_NO_NODE which I >> thought to be (int)-1. >> >> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to >> (int)-1 by pxm_to_node(). Given that there is a number of tests for >> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() >> return u8 as well? > > I noticed this as well, and found it quite counter intuitive. > > I would suggest fixing NUMA_NO_NODE to -1 and removing some of the > type-punning. I have to admit that I see no value in wasting 4 bytes for something that for the foreseeable future won't exceed 1 byte. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
>>> On 07.01.15 at 15:45, wrote: > On 01/07/2015 04:12 AM, Jan Beulich wrote: > On 06.01.15 at 14:41, wrote: >>> On 06/01/15 02:18, Boris Ostrovsky wrote: Instead of copying data for each field in xen_sysctl_topologyinfo separately put cpu/socket/node into a single structure and do a single copy for each processor. There is also no need to copy whole op to user at the end, max_cpu_index is sufficient Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the > fact that these are used for CPU topology. Subsequent patch will add support for PCI topology sysctl. Signed-off-by: Boris Ostrovsky >>> If we are going to change the hypercall, then can we see about making it >>> a stable interface (i.e. not a sysctl/domctl)? There are non-toolstack >>> components which might want/need access to this information. (i.e. I am >>> still looking for a reasonable way to get this information from Xen in >>> hwloc) >> In which case leaving the sysctl alone and just adding a new non-sysctl >> interface should be considered. > > I'd expect IO NUMA information to be used together with CPU topology > information so I am not sure how useful this would be. Unless we create > a similar interface for that (CPU/memory) as well. Creating a new CPU topology interface while leaving alone the current sysctl was what I meant to suggest. The I/O topology one would then become a sibling to the CPU one. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/xen: don't count how many PFNs are identity mapped
On 01/07/2015 03:47 PM, David Vrabel wrote: This accounting is just used to print a diagnostic message that isn't very useful. Signed-off-by: David Vrabel Reviewed-by: Juergen Gross --- arch/x86/xen/setup.c | 27 +-- 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index dfd77de..664dffc 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -229,15 +229,14 @@ static int __init xen_free_mfn(unsigned long mfn) * as a fallback if the remapping fails. */ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, - unsigned long end_pfn, unsigned long nr_pages, unsigned long *identity, - unsigned long *released) + unsigned long end_pfn, unsigned long nr_pages, unsigned long *released) { - unsigned long len = 0; unsigned long pfn, end; int ret; WARN_ON(start_pfn > end_pfn); + /* Release pages first. */ end = min(end_pfn, nr_pages); for (pfn = start_pfn; pfn < end; pfn++) { unsigned long mfn = pfn_to_mfn(pfn); @@ -250,16 +249,14 @@ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, WARN(ret != 1, "Failed to release pfn %lx err=%d\n", pfn, ret); if (ret == 1) { + (*released)++; if (!__set_phys_to_machine(pfn, INVALID_P2M_ENTRY)) break; - len++; } else break; } - /* Need to release pages first */ - *released += len; - *identity += set_phys_range_identity(start_pfn, end_pfn); + set_phys_range_identity(start_pfn, end_pfn); } /* @@ -318,7 +315,6 @@ static void __init xen_do_set_identity_and_remap_chunk( unsigned long ident_pfn_iter, remap_pfn_iter; unsigned long ident_end_pfn = start_pfn + size; unsigned long left = size; - unsigned long ident_cnt = 0; unsigned int i, chunk; WARN_ON(size == 0); @@ -347,8 +343,7 @@ static void __init xen_do_set_identity_and_remap_chunk( xen_remap_mfn = mfn; /* Set identity map */ - ident_cnt += set_phys_range_identity(ident_pfn_iter, - ident_pfn_iter + chunk); + set_phys_range_identity(ident_pfn_iter, ident_pfn_iter + chunk); left -= chunk; } @@ -371,7 +366,7 @@ static void __init xen_do_set_identity_and_remap_chunk( static unsigned long __init xen_set_identity_and_remap_chunk( const struct e820entry *list, size_t map_size, unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, unsigned long remap_pfn, - unsigned long *identity, unsigned long *released) + unsigned long *released) { unsigned long pfn; unsigned long i = 0; @@ -386,8 +381,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Do not remap pages beyond the current allocation */ if (cur_pfn >= nr_pages) { /* Identity map remaining pages */ - *identity += set_phys_range_identity(cur_pfn, - cur_pfn + size); + set_phys_range_identity(cur_pfn, cur_pfn + size); break; } if (cur_pfn + size > nr_pages) @@ -398,7 +392,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( if (!remap_range_size) { pr_warning("Unable to find available pfn range, not remapping identity pages\n"); xen_set_identity_and_release_chunk(cur_pfn, - cur_pfn + left, nr_pages, identity, released); + cur_pfn + left, nr_pages, released); break; } /* Adjust size to fit in current e820 RAM region */ @@ -410,7 +404,6 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Update variables to reflect new mappings. */ i += size; remap_pfn += size; - *identity += size; } /* @@ -430,7 +423,6 @@ static void __init xen_set_identity_and_remap( unsigned long *released) { phys_addr_t start = 0; - unsigned long identity = 0; unsigned long last_pfn = nr_pages; const struct e820entry *entry; unsigned long num_released = 0; @@ -460,14 +452,13 @@ static void __init xen_set_identity_and_remap( last_pfn = xen_set_identity_and_remap_chunk( list, map_size, start_pfn, end_pfn, nr_pages, last_pfn, - &identity, &num_
Re: [Xen-devel] [PATCH 2/3] x86/xen: add extra memory for remapped frames during setup
On 01/07/2015 03:47 PM, David Vrabel wrote: If the non-RAM regions in the e820 memory map are larger than the size of the initial balloon, a BUG was triggered as the frames are remaped beyond the limit of the linear p2m. The frames are remapped into the initial balloon area (xen_extra_mem) but not enough of this is available. Ensure enough extra memory regions are added for these remapped frames. Signed-off-by: David Vrabel Reviewed-by: Juergen Gross --- arch/x86/xen/setup.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 664dffc..feb6d86 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -366,7 +366,7 @@ static void __init xen_do_set_identity_and_remap_chunk( static unsigned long __init xen_set_identity_and_remap_chunk( const struct e820entry *list, size_t map_size, unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, unsigned long remap_pfn, - unsigned long *released) + unsigned long *released, unsigned long *remapped) { unsigned long pfn; unsigned long i = 0; @@ -404,6 +404,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Update variables to reflect new mappings. */ i += size; remap_pfn += size; + *remapped += size; } /* @@ -420,12 +421,13 @@ static unsigned long __init xen_set_identity_and_remap_chunk( static void __init xen_set_identity_and_remap( const struct e820entry *list, size_t map_size, unsigned long nr_pages, - unsigned long *released) + unsigned long *released, unsigned long *remapped) { phys_addr_t start = 0; unsigned long last_pfn = nr_pages; const struct e820entry *entry; unsigned long num_released = 0; + unsigned long num_remapped = 0; int i; /* @@ -452,12 +454,13 @@ static void __init xen_set_identity_and_remap( last_pfn = xen_set_identity_and_remap_chunk( list, map_size, start_pfn, end_pfn, nr_pages, last_pfn, - &num_released); + &num_released, &num_remapped); start = end; } } *released = num_released; + *remapped = num_remapped; pr_info("Released %ld page(s)\n", num_released); } @@ -577,6 +580,7 @@ char * __init xen_memory_setup(void) struct xen_memory_map memmap; unsigned long max_pages; unsigned long extra_pages = 0; + unsigned long remapped_pages; int i; int op; @@ -626,9 +630,10 @@ char * __init xen_memory_setup(void) * underlying RAM. */ xen_set_identity_and_remap(map, memmap.nr_entries, max_pfn, - &xen_released_pages); + &xen_released_pages, &remapped_pages); extra_pages += xen_released_pages; + extra_pages += remapped_pages; /* * Clamp the amount of extra memory to a EXTRA_MEM_RATIO ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/xen: Free bootmem in free_p2m_page() during early boot
On 07/01/15 14:08, Boris Ostrovsky wrote: > With recent changes in p2m we now have legitimate cases when > p2m memory needs to be freed during early boot (i.e. before > slab is initialized). > > Signed-off-by: Boris Ostrovsky Applied to to stable/for-linus-3.19, thanks. If I understand correctly this didn't fully fix your 32-bit dom0 crashes? David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] x86/xen: optimize get_phys_to_machine()
On 01/07/2015 03:47 PM, David Vrabel wrote: The page table walk is only needed to distinguish between identity and missing, both of which have INVALID_P2M_ENTRY. As get_phys_to_machine is called by __pfn_to_mfn() only which already checks for mfn == INVALID_P2M_ENTRY this optimization will have an effect only in the early boot case with pfn >= xen_p2m_size. I doubt this is necessary. Juergen Signed-off-by: David Vrabel --- arch/x86/xen/p2m.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index edbc7a6..a848201 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -405,8 +405,7 @@ void __init xen_vmalloc_p2m_tree(void) unsigned long get_phys_to_machine(unsigned long pfn) { - pte_t *ptep; - unsigned int level; + unsigned long mfn; if (unlikely(pfn >= xen_p2m_size)) { if (pfn < xen_max_p2m_pfn) @@ -414,19 +413,26 @@ unsigned long get_phys_to_machine(unsigned long pfn) return IDENTITY_FRAME(pfn); } + + mfn = xen_p2m_addr[pfn]; - ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level); - BUG_ON(!ptep || level != PG_LEVEL_4K); + if (unlikely(mfn == INVALID_P2M_ENTRY)) { + pte_t *ptep; + unsigned int level; - /* -* The INVALID_P2M_ENTRY is filled in both p2m_*identity -* and in p2m_*missing, so returning the INVALID_P2M_ENTRY -* would be wrong. -*/ - if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity))) - return IDENTITY_FRAME(pfn); + ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level); + BUG_ON(!ptep || level != PG_LEVEL_4K); + + /* +* The INVALID_P2M_ENTRY is filled in both p2m_*identity +* and in p2m_*missing, so returning the INVALID_P2M_ENTRY +* would be wrong. +*/ + if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity))) + return IDENTITY_FRAME(pfn); + } - return xen_p2m_addr[pfn]; + return mfn; } EXPORT_SYMBOL_GPL(get_phys_to_machine); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
>>> On 07.01.15 at 15:55, wrote: > On 01/07/2015 04:21 AM, Jan Beulich wrote: > On 06.01.15 at 03:18, wrote: >>> +for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ ) >>> +{ >>> +xen_sysctl_pcitopo_t pcitopo; >>> +struct pci_dev *pdev; >>> + >>> +if ( copy_from_guest_offset(&pcitopo, ti->pcitopo, >>> +ti->first_dev, 1) ) >>> +{ >>> +ret = -EFAULT; >>> +break; >>> +} >>> + >>> +spin_lock(&pcidevs_lock); >>> +pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus, >>> +pcitopo.pcidev.devfn); >>> +if ( !pdev || (pdev->node == NUMA_NO_NODE) ) >>> +pcitopo.node = INVALID_TOPOLOGY_ID; >>> +else >>> +pcitopo.node = pdev->node; >> Are hypervisor-internal node numbers really meaningful to the caller? > > > This is the same information (pxm -> node mapping ) that we provide in > XEN_SYSCTL_topologyinfo (renamed in this series to > XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be > used together I think the answer is yes. Building your argumentation on potentially mis-designed existing interfaces is bogus. The question is - what use is a Xen internal node number to a caller of a particular hypercall (other than it being purely informational, e.g. for printing human readable output)? In particular if we were to introduce a new non-sysctl interface, determining whether the hypervisor internal representation is really the right one to expose here should be one of the most important design aspects. I personally think that exposing e.g. the firmware determined (and hence hopefully stable across reboots) PXM would be more reasonable. >>> @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op > xen_sysctl_lockprof_op_t; >>> DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t); >>> >>> /* XEN_SYSCTL_cputopoinfo */ >>> -#define INVALID_TOPOLOGY_ID (~0U) >>> +#define INVALID_TOPOLOGY_ID (~0U) /* Also used by pcitopo */ >> Better extend the preceding comment. > > I mentioned it to Wei yesterday that the file is structured (or at least > appears to me to be structured) in such a way that these top comments > mark sections of definitions for each sysctl. And so I thought that I'd > be breaking this convention if I were to extend the comment. Yeah, I saw that discussion after having replied. Looking more closely, I think the placement of the INVALID_TOPOLOGY_ID definition is sub-optimal when you want to use it in a second place. Move it mode towards the beginning of the header, leave the current comment as is, and if you feel so ad a new comment ahead of it explaining which operations are using it. And then, if we go the non-sysctl route, the definition would likely need moving into xen.h anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/xen: Free bootmem in free_p2m_page() during early boot
On 01/07/2015 10:10 AM, David Vrabel wrote: On 07/01/15 14:08, Boris Ostrovsky wrote: With recent changes in p2m we now have legitimate cases when p2m memory needs to be freed during early boot (i.e. before slab is initialized). Signed-off-by: Boris Ostrovsky Applied to to stable/for-linus-3.19, thanks. If I understand correctly this didn't fully fix your 32-bit dom0 crashes? No, this only allowed me to be alive for a few more microseconds ;-) -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/2] Two libxl misc patches
These changes were requested by Ian Campbell when relevant patches were posted but they didn't warrant a resend at that time. Wei Liu (2): libxl_internal: lock_carefd -> carefd libxl_internal: comment on domain userdata unlock function tools/libxl/libxl_internal.c | 20 +--- tools/libxl/libxl_internal.h |2 +- 2 files changed, 18 insertions(+), 4 deletions(-) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel