Re: [Xen-devel] [PATCH 17/22] xen/arm: p2m: Don't need to restore the state for an idle vCPU.
On 07/20/2016 06:10 PM, Julien Grall wrote: > The function p2m_restore_state could be called with an idle vCPU in > arguments (when called by construct_dom0). However, we will never return > to EL0/EL1 in this case, so it is not necessary to restore the p2m > registers. > I absolutely agree. Cheers, ~Sergej > Signed-off-by: Julien Grall > --- > xen/arch/arm/p2m.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index c52081a..d1b6009 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -127,6 +127,9 @@ void p2m_restore_state(struct vcpu *n) > { > register_t hcr; > > +if ( is_idle_vcpu(n) ) > +return; > + > hcr = READ_SYSREG(HCR_EL2); > WRITE_SYSREG(hcr & ~HCR_VM, HCR_EL2); > isb(); > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 16/22] xen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain
Hi Julien, On 07/20/2016 06:10 PM, Julien Grall wrote: > The field vttbr holds the base address of the translation table for > guest. Its value will depends on how the p2m has been initialized and > will only be used by the code code. > > So move the field from arch_domain to p2m_domain. This will also ease > the implementation of altp2m. > --- > xen/arch/arm/p2m.c | 11 +++ > xen/arch/arm/traps.c | 2 +- > xen/include/asm-arm/domain.h | 1 - > xen/include/asm-arm/p2m.h| 3 +++ > 4 files changed, 11 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index c407e6a..c52081a 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -107,10 +107,14 @@ void dump_p2m_lookup(struct domain *d, paddr_t addr) > > static void p2m_load_VTTBR(struct domain *d) > { > +struct p2m_domain *p2m = &d->arch.p2m; > + This is ok to me. Further altp2m implementation can easily extend this code base. Also, as your patch (patch #17) as well eliminates the possibility that the idle domain reaches this function the following check could be potentially removed (as far as I know, p2m_load_VTTBR is reached only through p2m_restore_state). > if ( is_idle_domain(d) ) > return; > -BUG_ON(!d->arch.vttbr); > -WRITE_SYSREG64(d->arch.vttbr, VTTBR_EL2); > + > +ASSERT(p2m->vttbr); > + > +WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2); > isb(); /* Ensure update is visible */ > } > > @@ -1298,8 +1302,7 @@ static int p2m_alloc_table(struct domain *d) > > p2m->root = page; > > -d->arch.vttbr = page_to_maddr(p2m->root) > -| ((uint64_t)p2m->vmid&0xff)<<48; > +p2m->vttbr = page_to_maddr(p2m->root) | ((uint64_t)p2m->vmid & 0xff) << > 48; > > /* > * Make sure that all TLBs corresponding to the new VMID are flushed > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 06a8ee5..65c6fb4 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -880,7 +880,7 @@ void vcpu_show_registers(const struct vcpu *v) > ctxt.ifsr32_el2 = v->arch.ifsr; > #endif > > -ctxt.vttbr_el2 = v->domain->arch.vttbr; > +ctxt.vttbr_el2 = v->domain->arch.p2m.vttbr; > > _show_registers(&v->arch.cpu_info->guest_cpu_user_regs, &ctxt, 1, v); > } > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 4e9d8bf..9452fcd 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -48,7 +48,6 @@ struct arch_domain > > /* Virtual MMU */ > struct p2m_domain p2m; > -uint64_t vttbr; > > struct hvm_domain hvm_domain; > gfn_t *grant_table_gfn; > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index ce28e8a..53c4d78 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -32,6 +32,9 @@ struct p2m_domain { > /* Current VMID in use */ > uint8_t vmid; > > +/* Current Translation Table Base Register for the p2m */ > +uint64_t vttbr; > + > /* > * Highest guest frame that's ever been mapped in the p2m > * Only takes into account ram and foreign mapping > Cheers, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: > > On 07/21/2016 04:57 PM, Roger Pau Monné wrote: > > On Fri, Jul 15, 2016 at 05:31:49PM +0800, Bob Liu wrote: > >> The current VBD layer reserves buffer space for each attached device based > >> on > >> three statically configured settings which are read at boot time. > >> * max_indirect_segs: Maximum amount of segments. > >> * max_ring_page_order: Maximum order of pages to be used for the shared > >> ring. > >> * max_queues: Maximum of queues(rings) to be used. > >> > >> But the storage backend, workload, and guest memory result in very > >> different > >> tuning requirements. It's impossible to centrally predict application > >> characteristics so it's best to leave allow the settings can be dynamiclly > >> adjusted based on workload inside the Guest. > >> > >> Usage: > >> Show current values: > >> cat /sys/devices/vbd-xxx/max_indirect_segs > >> cat /sys/devices/vbd-xxx/max_ring_page_order > >> cat /sys/devices/vbd-xxx/max_queues > >> > >> Write new values: > >> echo > /sys/devices/vbd-xxx/max_indirect_segs > >> echo > /sys/devices/vbd-xxx/max_ring_page_order > >> echo > /sys/devices/vbd-xxx/max_queues > >> > >> Signed-off-by: Bob Liu > >> -- > >> v2: Add device lock and other comments from Konrad. > >> --- > >> drivers/block/xen-blkfront.c | 285 > >> ++- > >> 1 file changed, 283 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > >> index 10f46a8..9a5ed22 100644 > >> --- a/drivers/block/xen-blkfront.c > >> +++ b/drivers/block/xen-blkfront.c > >> @@ -46,6 +46,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> #include > >> #include > >> @@ -212,6 +213,11 @@ struct blkfront_info > >>/* Save uncomplete reqs and bios for migration. */ > >>struct list_head requests; > >>struct bio_list bio_list; > >> + /* For dynamic configuration. */ > >> + unsigned int reconfiguring:1; > >> + int new_max_indirect_segments; > >> + int new_max_ring_page_order; > >> + int new_max_queues; > >> }; > >> > >> static unsigned int nr_minors; > >> @@ -1350,6 +1356,31 @@ static void blkif_free(struct blkfront_info *info, > >> int suspend) > >>for (i = 0; i < info->nr_rings; i++) > >>blkif_free_ring(&info->rinfo[i]); > >> > >> + /* Remove old xenstore nodes. */ > >> + if (info->nr_ring_pages > 1) > >> + xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-page-order"); > >> + > >> + if (info->nr_rings == 1) { > >> + if (info->nr_ring_pages == 1) { > >> + xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-ref"); > >> + } else { > >> + for (i = 0; i < info->nr_ring_pages; i++) { > >> + char ring_ref_name[RINGREF_NAME_LEN]; > >> + > >> + snprintf(ring_ref_name, RINGREF_NAME_LEN, > >> "ring-ref%u", i); > >> + xenbus_rm(XBT_NIL, info->xbdev->nodename, > >> ring_ref_name); > >> + } > >> + } > >> + } else { > >> + xenbus_rm(XBT_NIL, info->xbdev->nodename, > >> "multi-queue-num-queues"); > >> + > >> + for (i = 0; i < info->nr_rings; i++) { > >> + char queuename[QUEUE_NAME_LEN]; > >> + > >> + snprintf(queuename, QUEUE_NAME_LEN, "queue-%u", i); > >> + xenbus_rm(XBT_NIL, info->xbdev->nodename, queuename); > >> + } > >> + } > >>kfree(info->rinfo); > >>info->rinfo = NULL; > >>info->nr_rings = 0; > >> @@ -1772,6 +1803,10 @@ static int talk_to_blkback(struct xenbus_device > >> *dev, > >>info->nr_ring_pages = 1; > >>else { > >>ring_page_order = min(xen_blkif_max_ring_order, max_page_order); > >> + if (info->new_max_ring_page_order) { > > > > Instead of calling this "new_max_ring_page_order", could you just call it > > max_ring_page_order, iniitalize it to xen_blkif_max_ring_order by default > > > Sure, I can do that. > > > > and use it everywhere instead of xen_blkif_max_ring_order? > > > But "xen_blkif_max_ring_order" still have to be used here, this is the only > place "xen_blkif_max_ring_order" is used(except checking the value of it in > xlblk_init()). > > > > > >> + BUG_ON(info->new_max_ring_page_order > max_page_order); > >> + ring_page_order = info->new_max_ring_page_order; > >> + } > >>info->nr_ring_pages = 1 << ring_page_order; > >>} > >> > >> @@ -1895,6 +1930,10 @@ static int negotiate_mq(struct blkfront_info *info) > >>backend_max_queues = 1; > >> > >>info->nr_rings = min(backend_max_queues, xen_blkif_max_queues); > >> + if (info->new_max_queues) { > > > > Same here IMHO, this is going to make the code flow slightly easier to > > understand. > > > >> + BUG_ON(info->new_max_queues > backend_max_queues); > >
Re: [Xen-devel] [PATCH 18/22] xen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain
Hi Julien, On 07/20/2016 06:11 PM, Julien Grall wrote: > The current implementation of flush_tlb_domain is relying on the domain > to have a single p2m. With the upcoming feature altp2m, a single domain > may have different p2m. So we would need to switch to the correct p2m in > order to flush the TLBs. > > Rather than checking whether the domain is not the current domain, check > whether the VTTBR is different. The resulting assembly code is much > smaller: from 38 instructions (+ 2 functions call) to 22 instructions. > > Signed-off-by: Julien Grall > --- > xen/arch/arm/p2m.c | 18 +++--- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index d1b6009..015c1e8 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -151,24 +151,28 @@ void p2m_restore_state(struct vcpu *n) > > void flush_tlb_domain(struct domain *d) > { > +struct p2m_domain *p2m = &d->arch.p2m; > unsigned long flags = 0; > +uint64_t ovttbr; > > /* > - * Update the VTTBR if necessary with the domain d. In this case, > - * it's only necessary to flush TLBs on every CPUs with the current VMID > - * (our domain). > + * ARM only provides an instruction to flush TLBs for the current > + * VMID. So switch to the VTTBR of a given P2M if different. > */ > -if ( d != current->domain ) > +ovttbr = READ_SYSREG64(VTTBR_EL2); > +if ( ovttbr != p2m->vttbr ) > { > local_irq_save(flags); > -p2m_load_VTTBR(d); > +WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2); > +isb(); > } > > flush_tlb(); > > -if ( d != current->domain ) > +if ( ovttbr != READ_SYSREG64(VTTBR_EL2) ) > { > -p2m_load_VTTBR(current->domain); > +WRITE_SYSREG64(ovttbr, VTTBR_EL2); > +isb(); > local_irq_restore(flags); > } > } > Thank you for this implementation change. Since the upcoming altp2m feature uses a VMID per p2m view, it makes absolutely sense to directly look for differing VTTBRs. Cheers, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 19/22] xen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state
Hi Julien, On 07/20/2016 06:11 PM, Julien Grall wrote: > p2m_restore_state is the last caller of p2m_load_VTTBR and already check > if the vCPU does not belong to the idle domain. > > Note that it is likely possible to remove some isb in the function > p2m_restore_state, however this is not the purpose of this patch. So the > numerous isb have been left. > Right now, I don't see any issues with removing the p2m_load_VTTBR function in combination with changes applied to flush_tlb_domain in your patch #18 and #17. However, I am not entirely sure whether it makes sense to entirely remove the function and replicate the VTTBR loading functionality across multiple functions. Why don't we just provide a struct p2m_domain* to p2m_load_VTTBR (potentially with a backpointer to the associated domain, as it is shown in the arm/altp2m patch) and use the function inline? Cheers, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On 07/22/2016 03:45 PM, Roger Pau Monné wrote: > On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: >> >> On 07/21/2016 04:57 PM, Roger Pau Monné wrote: ..[snip].. + +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, ssize_t count) +{ + unsigned int i; + int err = -EBUSY; + + /* + * Make sure no migration in parallel, device lock is actually a + * mutex. + */ + if (!device_trylock(&info->xbdev->dev)) { + pr_err("Fail to acquire dev:%s lock, may be in migration.\n", + dev_name(&info->xbdev->dev)); + return err; + } + + /* + * Prevent new requests and guarantee no uncompleted reqs. + */ + blk_mq_freeze_queue(info->rq); + if (part_in_flight(&info->gd->part0)) + goto out; + + /* + * FrontBackend + * Switch to XenbusStateClosed + * frontend_changed(): + * case XenbusStateClosed: + * xen_blkif_disconnect() + * Switch to XenbusStateClosed + * blkfront_resume(): + * frontend_changed(): + * reconnect + * Wait until XenbusStateConnected + */ + info->reconfiguring = true; + xenbus_switch_state(info->xbdev, XenbusStateClosed); + + /* Poll every 100ms, 1 minute timeout. */ + for (i = 0; i < 600; i++) { + /* + * Wait backend enter XenbusStateClosed, blkback_changed() + * will clear reconfiguring. + */ + if (!info->reconfiguring) + goto resume; + schedule_timeout_interruptible(msecs_to_jiffies(100)); + } >>> >>> Instead of having this wait, could you just set info->reconfiguring = 1, >>> set >>> the frontend state to XenbusStateClosed and mimic exactly what a resume >>> from >>> suspension does? blkback_changed would have to set the frontend state to >>> InitWait when it detects that the backend has switched to Closed, and call >>> blkfront_resume. >> >> >> I think that won't work. >> In the real "resume" case, the power management system will trigger all >> ->resume() path. >> But there is no place for dynamic configuration. > > Hello, > > I think it should be possible to set info->reconfiguring and wait for the > backend to switch to state Closed, at that point we should call blkif_resume > (from blkback_changed) and the backend will follow the reconection. > Okay, I get your point. Yes, that's an option. But this will make 'dynamic configuration' to be async, I'm worry about the end-user will get panic. E.g A end-user "echo > /sys/devices/vbd-xxx/max_indirect_segs", but then the device will be Closed and disappeared, the user have to wait for a random time so that the device can resume. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
Hi Julien, > -int p2m_alloc_table(struct domain *d) > +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. However, this requirement is clearly part of an entirely patch, which will be introduced in the near future and hence can be discussed there. Cheers, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
Hi Julien, On 07/20/2016 06:11 PM, Julien Grall wrote: > The function flush_tlb_domain is not used outside of the file where it > has been declared. > As for patch #15, the same applies here too: For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to ./xen/arch/arm/altp2m.c. Cheers, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
Thursday, July 21, 2016, 12:18:37 PM, you wrote: > c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X > table infrastructure not to always be initialised, but it missed one path > which needed an is-initialised check. > If a devices is passed through to a domain which is MSI capable but not MSI-X > capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq > hypercall still calls into msixtbl_pt_unregister(). This follows the linked > list pointer which is still NULL. > Introduce an is-initalised check to msixtbl_pt_unregister(). > Furthermore, the purpose of the open-coded msixtbl_list.next check is rather > subtle. Introduce an msixtbl_initialised() predicate instead, which makes its > purpose far more obvious. > Reported-by: Sander Eikelenboom > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Sander Eikelenboom > Sander - would you mind double checking this patch? > --- Hi Andrew, Just got the chance to test and it works for me ! Thanks, Sander > xen/arch/x86/hvm/vmsi.c | 16 +--- > 1 file changed, 13 insertions(+), 3 deletions(-) > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index e418b98..ef1dfff 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -166,6 +166,16 @@ struct msixtbl_entry > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > > +/* > + * MSI-X table infrastructure is dynamically initialised when an MSI-X > capable > + * device is passed through to a domain, rather than unconditionally for all > + * domains. > + */ > +static bool msixtbl_initialised(const struct domain *d) > +{ +return !!d->>arch.hvm_domain.msixtbl_list.next; > +} > + > static struct msixtbl_entry *msixtbl_find_entry( > struct vcpu *v, unsigned long addr) > { > @@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq > *pirq) > ASSERT(pcidevs_locked()); > ASSERT(spin_is_locked(&d->event_lock)); > > -if ( !has_vlapic(d) ) > +if ( !msixtbl_initialised(d) ) > return; > > irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); > @@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d) > struct hvm_io_handler *handler; > > if ( !has_hvm_container_domain(d) || !has_vlapic(d) || - d->>arch.hvm_domain.msixtbl_list.next ) > + msixtbl_initialised(d) ) > return; > > INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); > @@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d) > { > struct msixtbl_entry *entry, *temp; > -if ( !d->>arch.hvm_domain.msixtbl_list.next ) > +if ( !msixtbl_initialised(d) ) > return; > > spin_lock(&d->event_lock); ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
On 18.07.2016 18:44, Andreas Ziegler wrote: did the information that Ingo provided (i cited his message to the list below) maybe help in narrowing down the possible issue? If you need additional information we can try getting it for you, as Ingo might be able to reproduce the kernel panic, although not reliably. In the meanwhile, I activated IPv6 again on Tuesday evening and today the server crashed again some minutes ago. Here's the output from netconsole: Jul 22 10:27:55 31.172.31.251 [171182.273973] BUG: unable to handle kernel Jul 22 10:27:55 31.172.31.251 at 88025b2d Jul 22 10:27:55 31.172.31.251 [171182.274039] IP: Jul 22 10:27:55 31.172.31.251 [] memcpy+0x6/0x110 Jul 22 10:27:55 31.172.31.251 [171182.274080] PGD 1814067 Jul 22 10:27:55 31.172.31.251 Jul 22 10:27:55 31.172.31.251 [171182.274130] Oops: [#1] Jul 22 10:27:55 31.172.31.251 Jul 22 10:27:55 31.172.31.251 [171182.274171] Modules linked in: Jul 22 10:27:55 31.172.31.251 nls_utf8 Jul 22 10:27:55 31.172.31.251 btrfs Jul 22 10:27:55 31.172.31.251 xor Jul 22 10:27:55 31.172.31.251 raid6_pq Jul 22 10:27:55 31.172.31.251 ufs Jul 22 10:27:55 31.172.31.251 qnx4 Jul 22 10:27:55 31.172.31.251 hfsplus Jul 22 10:27:55 31.172.31.251 hfs Jul 22 10:27:55 31.172.31.251 minix Jul 22 10:27:55 31.172.31.251 ntfs Jul 22 10:27:55 31.172.31.251 vfat Jul 22 10:27:55 31.172.31.251 msdos Jul 22 10:27:55 31.172.31.251 fat Jul 22 10:27:55 31.172.31.251 jfs Jul 22 10:27:55 31.172.31.251 authenc Jul 22 10:27:55 31.172.31.251 xfrm6_mode_tunnel Jul 22 10:27:55 31.172.31.251 xfrm4_mode_tunnel Jul 22 10:27:55 31.172.31.251 xt_physdev Jul 22 10:27:55 31.172.31.251 xen_netback Jul 22 10:27:55 31.172.31.251 tun Jul 22 10:27:55 31.172.31.251 xen_blkback Jul 22 10:27:55 31.172.31.251 xt_multiport Jul 22 10:27:55 31.172.31.251 twofish_generic Jul 22 10:27:55 31.172.31.251 twofish_avx_x86_64 Jul 22 10:27:55 31.172.31.251 twofish_x86_64_3way Jul 22 10:27:55 31.172.31.251 twofish_x86_64 Jul 22 10:27:55 31.172.31.251 twofish_common Jul 22 10:27:55 31.172.31.251 serpent_avx_x86_64 Jul 22 10:27:55 31.172.31.251 serpent_sse2_x86_64 Jul 22 10:27:55 31.172.31.251 serpent_generic Jul 22 10:27:55 31.172.31.251 blowfish_generic Jul 22 10:27:55 31.172.31.251 blowfish_x86_64 Jul 22 10:27:55 31.172.31.251 blowfish_common Jul 22 10:27:55 31.172.31.251 cast5_avx_x86_64 Jul 22 10:27:55 31.172.31.251 cast5_generic Jul 22 10:27:55 31.172.31.251 cast_common Jul 22 10:27:55 31.172.31.251 seqiv Jul 22 10:27:55 31.172.31.251 ctr Jul 22 10:27:55 31.172.31.251 ecb Jul 22 10:27:55 31.172.31.251 des_generic Jul 22 10:27:55 31.172.31.251 cbc Jul 22 10:27:55 31.172.31.251 algif_skcipher Jul 22 10:27:55 31.172.31.251 camellia_generic Jul 22 10:27:55 31.172.31.251 camellia_aesni_avx_x86_64 Jul 22 10:27:55 31.172.31.251 camellia_x86_64 Jul 22 10:27:55 31.172.31.251 xts Jul 22 10:27:55 31.172.31.251 xcbc Jul 22 10:27:55 31.172.31.251 hmac Jul 22 10:27:55 31.172.31.251 sha512_ssse3 Jul 22 10:27:55 31.172.31.251 sha512_generic Jul 22 10:27:55 31.172.31.251 sha256_ssse3 Jul 22 10:27:55 31.172.31.251 sha256_generic Jul 22 10:27:55 31.172.31.251 md4 Jul 22 10:27:55 31.172.31.251 algif_hash Jul 22 10:27:55 31.172.31.251 af_alg Jul 22 10:27:55 31.172.31.251 ipt_ULOG Jul 22 10:27:55 31.172.31.251 xfrm_user Jul 22 10:27:55 31.172.31.251 xfrm4_tunnel Jul 22 10:27:55 31.172.31.251 xen_gntdev Jul 22 10:27:55 31.172.31.251 tunnel4 Jul 22 10:27:55 31.172.31.251 xen_evtchn Jul 22 10:27:55 31.172.31.251 xenfs Jul 22 10:27:55 31.172.31.251 xen_privcmd Jul 22 10:27:55 31.172.31.251 ipcomp Jul 22 10:27:55 31.172.31.251 xfrm_ipcomp Jul 22 10:27:55 31.172.31.251 esp4 Jul 22 10:27:55 31.172.31.251 ah4 Jul 22 10:27:55 31.172.31.251 af_key Jul 22 10:27:55 31.172.31.251 xfrm_algo Jul 22 10:27:55 31.172.31.251 ipmi_poweroff Jul 22 10:27:55 31.172.31.251 video Jul 22 10:27:55 31.172.31.251 thermal Jul 22 10:27:55 31.172.31.251 fan Jul 22 10:27:55 31.172.31.251 ac Jul 22 10:27:55 31.172.31.251 battery Jul 22 10:27:55 31.172.31.251 ip6t_REJECT Jul 22 10:27:55 31.172.31.251 ip6table_filter Jul 22 10:27:55 31.172.31.251 ip6_tables Jul 22 10:27:55 31.172.31.251 bridge Jul 22 10:27:55 31.172.31.251 stp Jul 22 10:27:55 31.172.31.251 llc Jul 22 10:27:55 31.172.31.251 ipt_REJECT Jul 22 10:27:55 31.172.31.251 xt_tcpudp Jul 22 10:27:55 31.172.31.251 iptable_filter Jul 22 10:27:55 31.172.31.251 ip_tables Jul 22 10:27:55 31.172.31.251 x_tables Jul 22 10:27:55 31.172.31.251 ext4 Jul 22 10:27:55 31.172.31.251 crc16 Jul 22 10:27:55 31.172.31.251 mbcache Jul 22 10:27:55 31.172.31.251 jbd2 Jul 22 10:27:55 31.172.31.251 fuse Jul 22 10:27:55 31.172.31.251 ipmi_devintf Jul 22 10:27:55 31.172.31.251 ipmi_watchdog Jul 22 10:27:55 31.172.31.251 w83627ehf Jul 22 10:27:55 31.172.31.251 hwmon_vid Jul 22 10:27:55 31.172.31.251 nf_conntrack_ipv4 Jul 22 10:27:55 31.172.31.251 nf_defrag_ipv4 Jul 22 10:27:55 31.172.31.251 nf_conntrack Jul 22 10:
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 22/07/16 09:32, Sergej Proskurin wrote: Hi Julien, Hello Sergej, -int p2m_alloc_table(struct domain *d) +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. No, this function should not be called outside p2m.c as it will not fully initialize the p2m. You need to need to provide a function to initialize a p2m (such as p2m_init). Regards. -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 16/22] xen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain
On 22/07/16 08:46, Sergej Proskurin wrote: Hi Julien, Hello Sergej, On 07/20/2016 06:10 PM, Julien Grall wrote: The field vttbr holds the base address of the translation table for guest. Its value will depends on how the p2m has been initialized and will only be used by the code code. So move the field from arch_domain to p2m_domain. This will also ease the implementation of altp2m. --- xen/arch/arm/p2m.c | 11 +++ xen/arch/arm/traps.c | 2 +- xen/include/asm-arm/domain.h | 1 - xen/include/asm-arm/p2m.h| 3 +++ 4 files changed, 11 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index c407e6a..c52081a 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -107,10 +107,14 @@ void dump_p2m_lookup(struct domain *d, paddr_t addr) static void p2m_load_VTTBR(struct domain *d) { +struct p2m_domain *p2m = &d->arch.p2m; + This is ok to me. Further altp2m implementation can easily extend this code base. Also, as your patch (patch #17) as well eliminates the possibility that the idle domain reaches this function the following check could be potentially removed (as far as I know, p2m_load_VTTBR is reached only through p2m_restore_state). I will not remove this check because this is not the goal of this patch and each patch should boot Xen without any dependencies on a follow-up patch: p2m_load_VTTBR is used flush_tlb_domain until patch #18 and p2m_restore_state does not yet have the is_idle_* check (it will be added in patch #17). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Locking on vm-event operations (monitor)
Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). Let me know what you think, Corneliu. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 19/22] xen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state
On 22/07/16 09:07, Sergej Proskurin wrote: Hi Julien, Hello Sergej, On 07/20/2016 06:11 PM, Julien Grall wrote: p2m_restore_state is the last caller of p2m_load_VTTBR and already check if the vCPU does not belong to the idle domain. Note that it is likely possible to remove some isb in the function p2m_restore_state, however this is not the purpose of this patch. So the numerous isb have been left. Right now, I don't see any issues with removing the p2m_load_VTTBR function in combination with changes applied to flush_tlb_domain in your patch #18 and #17. However, I am not entirely sure whether it makes sense to entirely remove the function and replicate the VTTBR loading functionality across multiple functions. Why don't we just provide a struct p2m_domain* to p2m_load_VTTBR (potentially with a backpointer to the associated domain, as it is shown in the arm/altp2m patch) and use the function inline? Because ideally this function should take a p2m in parameter and a p2m cannot belong to an idle domain. So the function would be: WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2); isb(); However, in the case of p2m_restore_state the isb() is not necessary and will impact the performance. Yes, I know the function contains a lots of pointless isb(), this needs to be fixed at some point. So overall, this function is not necessary. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH XTF 3/3] xtf-runner: regularise runner exit code
On Thu, Jul 21, 2016 at 08:04:59PM +0100, Andrew Cooper wrote: > On 21/07/2016 16:44, Wei Liu wrote: > > Report the first "ERROR" and "FAILURE" if found, otherwise report "SKIP" > > if found. Eventually if everything is ok the exit code will be 0. > > > > See runner code for numeric exit code space. > > > > Signed-off-by: Wei Liu > > --- > > xtf-runner | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/xtf-runner b/xtf-runner > > index 17ce933..ebe5c27 100755 > > --- a/xtf-runner > > +++ b/xtf-runner > > @@ -249,17 +249,23 @@ def run_tests(args): > > if not len(tests): > > raise RunnerError("No tests to run") > > > > -rc = 0 > > +rc = exit_code('SUCCESS') > > This logic would be easier to express if you use the indices of > all_results as a measure of severity. > > e.g. > > rc = all_results.index('SUCCESS') > > > results = [] > > > > for test in tests: > > > > res = run_test(test) > > -if res != "SUCCESS": > > -rc = 1 > > +if res in ("ERROR", "FAILURE") and rc == exit_code('SUCCESS'): > > +rc = exit_code(res) > > res_idx = all_results.index(res) > if res_idx > rc: > rc = res_idx I intended to report the first "error" or "failure" encountered. This would cause a FAILURE to overwrite previous ERROR result. Is that what you want? Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
On 22/07/16 09:54, Sergej Proskurin wrote: Hi Julien, Hello Sergej, On 07/20/2016 06:11 PM, Julien Grall wrote: The function flush_tlb_domain is not used outside of the file where it has been declared. As for patch #15, the same applies here too: For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to ./xen/arch/arm/altp2m.c. Based on your previous version, I don't see any reason to flush call flush_tlb_domain/p2m_flush_tlb in altp2m. Please justify why you would need it. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote: > > On 07/22/2016 03:45 PM, Roger Pau Monné wrote: > > On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: > >> > >> On 07/21/2016 04:57 PM, Roger Pau Monné wrote: > ..[snip].. > + > +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, > ssize_t count) > +{ > +unsigned int i; > +int err = -EBUSY; > + > +/* > + * Make sure no migration in parallel, device lock is actually a > + * mutex. > + */ > +if (!device_trylock(&info->xbdev->dev)) { > +pr_err("Fail to acquire dev:%s lock, may be in > migration.\n", > +dev_name(&info->xbdev->dev)); > +return err; > +} > + > +/* > + * Prevent new requests and guarantee no uncompleted reqs. > + */ > +blk_mq_freeze_queue(info->rq); > +if (part_in_flight(&info->gd->part0)) > +goto out; > + > +/* > + * FrontBackend > + * Switch to XenbusStateClosed > + * frontend_changed(): > + * case XenbusStateClosed: > + * > xen_blkif_disconnect() > + * Switch to > XenbusStateClosed > + * blkfront_resume(): > + * frontend_changed(): > + * reconnect > + * Wait until XenbusStateConnected > + */ > +info->reconfiguring = true; > +xenbus_switch_state(info->xbdev, XenbusStateClosed); > + > +/* Poll every 100ms, 1 minute timeout. */ > +for (i = 0; i < 600; i++) { > +/* > + * Wait backend enter XenbusStateClosed, > blkback_changed() > + * will clear reconfiguring. > + */ > +if (!info->reconfiguring) > +goto resume; > +schedule_timeout_interruptible(msecs_to_jiffies(100)); > +} > >>> > >>> Instead of having this wait, could you just set info->reconfiguring = 1, > >>> set > >>> the frontend state to XenbusStateClosed and mimic exactly what a resume > >>> from > >>> suspension does? blkback_changed would have to set the frontend state to > >>> InitWait when it detects that the backend has switched to Closed, and > >>> call > >>> blkfront_resume. > >> > >> > >> I think that won't work. > >> In the real "resume" case, the power management system will trigger all > >> ->resume() path. > >> But there is no place for dynamic configuration. > > > > Hello, > > > > I think it should be possible to set info->reconfiguring and wait for the > > backend to switch to state Closed, at that point we should call > > blkif_resume > > (from blkback_changed) and the backend will follow the reconection. > > > > Okay, I get your point. Yes, that's an option. > > But this will make 'dynamic configuration' to be async, I'm worry about the > end-user will get panic. > E.g > A end-user "echo > /sys/devices/vbd-xxx/max_indirect_segs", > but then the device will be Closed and disappeared, the user have to wait for > a random time so that the device can resume. That should not happen, AFAICT on migration the device never dissapears. alloc_disk and friends should not be called on resume from migration (see the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED path for the reconfiguration). Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH XTF 3/3] xtf-runner: regularise runner exit code
On 22/07/16 10:29, Wei Liu wrote: On Thu, Jul 21, 2016 at 08:04:59PM +0100, Andrew Cooper wrote: On 21/07/2016 16:44, Wei Liu wrote: Report the first "ERROR" and "FAILURE" if found, otherwise report "SKIP" if found. Eventually if everything is ok the exit code will be 0. See runner code for numeric exit code space. Signed-off-by: Wei Liu --- xtf-runner | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/xtf-runner b/xtf-runner index 17ce933..ebe5c27 100755 --- a/xtf-runner +++ b/xtf-runner @@ -249,17 +249,23 @@ def run_tests(args): if not len(tests): raise RunnerError("No tests to run") -rc = 0 +rc = exit_code('SUCCESS') This logic would be easier to express if you use the indices of all_results as a measure of severity. e.g. rc = all_results.index('SUCCESS') results = [] for test in tests: res = run_test(test) -if res != "SUCCESS": -rc = 1 +if res in ("ERROR", "FAILURE") and rc == exit_code('SUCCESS'): +rc = exit_code(res) res_idx = all_results.index(res) if res_idx > rc: rc = res_idx I intended to report the first "error" or "failure" encountered. This would cause a FAILURE to overwrite previous ERROR result. Is that what you want? When running more than one test, the overall result should be the most severe. So yes, a subsequent FAILURE should override an ERROR. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [XTF PATCH v2 4/4] xtf-runner: regularise runner exit code
The script now returns the most severe result. Document the exit code in help string. Signed-off-by: Wei Liu --- xtf-runner | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/xtf-runner b/xtf-runner index 1c96750..15b98c6 100755 --- a/xtf-runner +++ b/xtf-runner @@ -251,23 +251,30 @@ def run_tests(args): if not len(tests): raise RunnerError("No tests to run") -rc = 0 +rc = all_results.index('SUCCESS') results = [] for test in tests: res = run_test(test) -if res != "SUCCESS": -rc = 1 +res_idx = all_results.index(res); +if res_idx > rc: +rc = res_idx results.append(res) +if rc == exit_code('SUCCESS'): +for res in results: +if res == 'SKIP': +rc = exit_code('SKIP') +break + print "\nCombined test results:" for test, res in zip(tests, results): print "%-40s %s" % (test, res) -return rc +return exit_code(all_results[rc]) def main(): @@ -308,6 +315,13 @@ def main(): " List all 'functional' or 'special' tests\n" "./xtf-runner --list hvm64\n" " List all 'hvm64' tests\n" + "\n" + " Exit code for this script:\n" + "0:everythin is ok\n" + "1,2: reserved for python interpreter\n" + "3:test(s) are skipped\n" + "4:test(s) report error\n" + "5:test(s) report failure\n" ), ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [XTF PATCH v2 3/4] xtf-runner: provide a set of exit codes for different results
Signed-off-by: Wei Liu --- xtf-runner | 10 ++ 1 file changed, 10 insertions(+) diff --git a/xtf-runner b/xtf-runner index c938f28..1c96750 100755 --- a/xtf-runner +++ b/xtf-runner @@ -21,6 +21,16 @@ except ImportError: # Note that warning is not a result on its own. all_results = ('SUCCESS', 'SKIP', 'ERROR', 'FAILURE') +# Return the exit code for different states. Avoid using 1 and 2 because +# python interpreter uses them -- see document for sys.exit. +def exit_code(state): +""" Convert a test result to an xtf-runner exit code. """ +return { "SUCCESS": 0, + "SKIP":3, + "ERROR": 4, + "FAILURE": 5, +}.get(state, 4) + # All test categories and configurations all_categories = ("special", "functional", "xsa", "utility") pv_environments = ("pv64", "pv32pae") -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [XTF PATCH v2 0/4] Runner exit code clean up
Wei Liu (4): gitignore: ignore vim swp file xtf-runner: sync all test results xtf-runner: provide a set of exit codes for different results xtf-runner: regularise runner exit code .gitignore | 1 + xtf-runner | 37 - 2 files changed, 33 insertions(+), 5 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [XTF PATCH v2 1/4] gitignore: ignore vim swp file
Signed-off-by: Wei Liu --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 01243f0..56e1884 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ *.d *.pyc *.pyo +*.swp /arch/x86/*.lds /cscope.* /dist/ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [XTF PATCH v2 2/4] xtf-runner: sync all test results
Signed-off-by: Wei Liu --- xtf-runner | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/xtf-runner b/xtf-runner index 50a5e96..c938f28 100755 --- a/xtf-runner +++ b/xtf-runner @@ -17,6 +17,9 @@ try: except ImportError: import simplejson as json +# All results of a test, keep in sync with C code report.h. +# Note that warning is not a result on its own. +all_results = ('SUCCESS', 'SKIP', 'ERROR', 'FAILURE') # All test categories and configurations all_categories = ("special", "functional", "xsa", "utility") @@ -161,7 +164,7 @@ def run_test(test): if not "Test result:" in test_result: return "ERROR" -for res in ("SUCCESS", "SKIP", "FAILURE"): +for res in all_results: if res in test_result: return res -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On 07/22/2016 05:34 PM, Roger Pau Monné wrote: > On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote: >> >> On 07/22/2016 03:45 PM, Roger Pau Monné wrote: >>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: On 07/21/2016 04:57 PM, Roger Pau Monné wrote: >> ..[snip].. >> + >> +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, >> ssize_t count) >> +{ >> +unsigned int i; >> +int err = -EBUSY; >> + >> +/* >> + * Make sure no migration in parallel, device lock is actually a >> + * mutex. >> + */ >> +if (!device_trylock(&info->xbdev->dev)) { >> +pr_err("Fail to acquire dev:%s lock, may be in >> migration.\n", >> +dev_name(&info->xbdev->dev)); >> +return err; >> +} >> + >> +/* >> + * Prevent new requests and guarantee no uncompleted reqs. >> + */ >> +blk_mq_freeze_queue(info->rq); >> +if (part_in_flight(&info->gd->part0)) >> +goto out; >> + >> +/* >> + * FrontBackend >> + * Switch to XenbusStateClosed >> + * frontend_changed(): >> + * case XenbusStateClosed: >> + * >> xen_blkif_disconnect() >> + * Switch to >> XenbusStateClosed >> + * blkfront_resume(): >> + * frontend_changed(): >> + * reconnect >> + * Wait until XenbusStateConnected >> + */ >> +info->reconfiguring = true; >> +xenbus_switch_state(info->xbdev, XenbusStateClosed); >> + >> +/* Poll every 100ms, 1 minute timeout. */ >> +for (i = 0; i < 600; i++) { >> +/* >> + * Wait backend enter XenbusStateClosed, >> blkback_changed() >> + * will clear reconfiguring. >> + */ >> +if (!info->reconfiguring) >> +goto resume; >> +schedule_timeout_interruptible(msecs_to_jiffies(100)); >> +} > > Instead of having this wait, could you just set info->reconfiguring = 1, > set > the frontend state to XenbusStateClosed and mimic exactly what a resume > from > suspension does? blkback_changed would have to set the frontend state to > InitWait when it detects that the backend has switched to Closed, and > call > blkfront_resume. I think that won't work. In the real "resume" case, the power management system will trigger all ->resume() path. But there is no place for dynamic configuration. >>> >>> Hello, >>> >>> I think it should be possible to set info->reconfiguring and wait for the >>> backend to switch to state Closed, at that point we should call >>> blkif_resume >>> (from blkback_changed) and the backend will follow the reconection. >>> >> >> Okay, I get your point. Yes, that's an option. >> >> But this will make 'dynamic configuration' to be async, I'm worry about the >> end-user will get panic. >> E.g >> A end-user "echo > /sys/devices/vbd-xxx/max_indirect_segs", >> but then the device will be Closed and disappeared, the user have to wait >> for a random time so that the device can resume. > > That should not happen, AFAICT on migration the device never dissapears. Oh, yes. > alloc_disk and friends should not be called on resume from migration (see > the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED > path for the reconfiguration). > What about if the end-user starts I/O immediately after writing new value to /sys? But the resume is still in progress. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [XTF PATCH v2 4/4] xtf-runner: regularise runner exit code
On 22/07/16 10:43, Wei Liu wrote: The script now returns the most severe result. Document the exit code in help string. Signed-off-by: Wei Liu --- xtf-runner | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/xtf-runner b/xtf-runner index 1c96750..15b98c6 100755 --- a/xtf-runner +++ b/xtf-runner @@ -251,23 +251,30 @@ def run_tests(args): if not len(tests): raise RunnerError("No tests to run") -rc = 0 +rc = all_results.index('SUCCESS') results = [] for test in tests: res = run_test(test) -if res != "SUCCESS": -rc = 1 +res_idx = all_results.index(res); +if res_idx > rc: +rc = res_idx results.append(res) +if rc == exit_code('SUCCESS'): +for res in results: +if res == 'SKIP': +rc = exit_code('SKIP') +break Why is this conditional needed? SKIP has index 1 so will automatically displace SUCCESS in the change above. + print "\nCombined test results:" for test, res in zip(tests, results): print "%-40s %s" % (test, res) -return rc +return exit_code(all_results[rc]) def main(): @@ -308,6 +315,13 @@ def main(): " List all 'functional' or 'special' tests\n" "./xtf-runner --list hvm64\n" " List all 'hvm64' tests\n" + "\n" + " Exit code for this script:\n" + "0:everythin is ok\n" everything ~Andrew + "1,2: reserved for python interpreter\n" + "3:test(s) are skipped\n" + "4:test(s) report error\n" + "5:test(s) report failure\n" ), ) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 22/07/16 10:27, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). Correct. Without the vcpu/domain pause, a vcpu could be midway through a vmexit path with the monitor configuration changing under its feet. OTOH, it could be running in guest context, and only receive the update one scheduling timeslice later. As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). I don't think you are missing anything. It seems like a monitor lock is needed. FWIW, the domctl lock is only used at the moment because it was easy, and worked when everything was a domctl. Being a global spinlock, it is a severe bottlekneck for concurrent domain management, which I need to find some time to break apart, so the less reliance on it the better from my point of view. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [XTF PATCH v2 4/4] xtf-runner: regularise runner exit code
On Fri, Jul 22, 2016 at 10:49:18AM +0100, Andrew Cooper wrote: > On 22/07/16 10:43, Wei Liu wrote: > > >The script now returns the most severe result. Document the exit code in > >help string. > > > >Signed-off-by: Wei Liu > >--- > > xtf-runner | 22 ++ > > 1 file changed, 18 insertions(+), 4 deletions(-) > > > >diff --git a/xtf-runner b/xtf-runner > >index 1c96750..15b98c6 100755 > >--- a/xtf-runner > >+++ b/xtf-runner > >@@ -251,23 +251,30 @@ def run_tests(args): > > if not len(tests): > > raise RunnerError("No tests to run") > >-rc = 0 > >+rc = all_results.index('SUCCESS') > > results = [] > > for test in tests: > > res = run_test(test) > >-if res != "SUCCESS": > >-rc = 1 > >+res_idx = all_results.index(res); > >+if res_idx > rc: > >+rc = res_idx > > results.append(res) > >+if rc == exit_code('SUCCESS'): > >+for res in results: > >+if res == 'SKIP': > >+rc = exit_code('SKIP') > >+break > > Why is this conditional needed? SKIP has index 1 so will automatically > displace SUCCESS in the change above. > Forgot to delete that hunk. I will resend. > >+ > > print "\nCombined test results:" > > for test, res in zip(tests, results): > > print "%-40s %s" % (test, res) > >-return rc > >+return exit_code(all_results[rc]) > > def main(): > >@@ -308,6 +315,13 @@ def main(): > >" List all 'functional' or 'special' tests\n" > >"./xtf-runner --list hvm64\n" > >" List all 'hvm64' tests\n" > >+ "\n" > >+ " Exit code for this script:\n" > >+ "0:everythin is ok\n" > > everything Fixed. Wei. > > ~Andrew > > >+ "1,2: reserved for python interpreter\n" > >+ "3:test(s) are skipped\n" > >+ "4:test(s) report error\n" > >+ "5:test(s) report failure\n" > >), > > ) > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: > Hi, > > I've been inspecting vm-event code parts to try and understand when and > why domain pausing/locking is done. If I understood correctly, domain > pausing is done solely to force all the vCPUs of that domain to see a > configuration update and act upon it (e.g. in the case of a > XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, > domain pausing/unpausing ensures immediate enabling of CR3 load-exiting > for all vCPUs), not to synchronize concurrent operations (lock-behavior). > > As for locking, I see that for example vm_event_init_domain(), > vm_event_cleanup_domain() and monitor_domctl() are all protected by the > domctl-lock, but I don't think that's enough. > Here are a few code-paths that led me to believe that: > * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* > resources, but it doesn't seem to take the domctl lock, so it seems > possible for that to happen _while_ those resources are > initialized/cleaned-up > * monitor_guest_request() also calls > monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() > which attempts a vm_event_ring_lock(ved), which could also be called > _while_ that's initialized (vm_event_enable()) or cleaned-up > (vm_event_disable()) > * hvm_monitor_cr() - e.g. on the code-path > vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() > there doesn't seem to be taken into account the possibility of a > concurrent monitor_init_domain()/monitor_cleanup_domain() > > Am I missing something with these conclusions? Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved. Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5). > As a resolution for this, I've been thinking of adding a 'subsys_lock' > field in the vm_event_domain structure, either spinlock or rw-lock, > which would be initialised/uninitialised when d.vm_event is > allocated/freed (domain_create()/complete_domain_destroy()). I have nothing against this. Having as many assurances as possible that things will work is definitely a plus in my book - with the comment that I would prefer a rwlock to an ordinary spinlock, and that "subsys_lock" sounds obscure to me, although I admit that I can't think of a good name at the moment. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [XTF PATCH v2 4/4] xtf-runner: regularise runner exit code
Updated this patch ---8<--- From ef1f9ddfa797bb4095d48b90efc3c92a0a8fd1b6 Mon Sep 17 00:00:00 2001 From: Wei Liu Date: Mon, 13 Jun 2016 15:06:48 +0100 Subject: [XTF PATCH] xtf-runner: regularise runner exit code Cc: andrew.coop...@citrix.com The script now returns the most severe result. Document the exit code in help string. Signed-off-by: Wei Liu --- xtf-runner | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/xtf-runner b/xtf-runner index 1c96750..7743316 100755 --- a/xtf-runner +++ b/xtf-runner @@ -251,14 +251,15 @@ def run_tests(args): if not len(tests): raise RunnerError("No tests to run") -rc = 0 +rc = all_results.index('SUCCESS') results = [] for test in tests: res = run_test(test) -if res != "SUCCESS": -rc = 1 +res_idx = all_results.index(res); +if res_idx > rc: +rc = res_idx results.append(res) @@ -267,7 +268,7 @@ def run_tests(args): for test, res in zip(tests, results): print "%-40s %s" % (test, res) -return rc +return exit_code(all_results[rc]) def main(): @@ -308,6 +309,13 @@ def main(): " List all 'functional' or 'special' tests\n" "./xtf-runner --list hvm64\n" " List all 'hvm64' tests\n" + "\n" + " Exit code for this script:\n" + "0:everything is ok\n" + "1,2: reserved for python interpreter\n" + "3:test(s) are skipped\n" + "4:test(s) report error\n" + "5:test(s) report failure\n" ), ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
Hi Julien, On 07/22/2016 11:18 AM, Julien Grall wrote: > > > On 22/07/16 09:32, Sergej Proskurin wrote: >> Hi Julien, > > Hello Sergej, > >>> -int p2m_alloc_table(struct domain *d) >>> +static int p2m_alloc_table(struct domain *d) >> >> While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the >> function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c >> to allocate the individual altp2m views. Hence it should not be static. > > No, this function should not be called outside p2m.c as it will not > fully initialize the p2m. You need to need to provide a function to > initialize a p2m (such as p2m_init). > The last time we have discussed reusing existing code, among others, for individual struct p2m_domain initialization routines. Also, we have agreed to move altp2m-related parts out of p2m.c into altp2m.c, which makes it hard not to access parts required for initialization/teardown (that are equal for both p2m and altp2m). I agree that functions that, e.g., do not entirely initialize a specific data structure should not be accessed from elsewhere. But then, we should not have moved altp2m-related information out of p2m.c as they simply need the same functionality when it comes to initialization/teardown. Best regards, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [XTF PATCH v2 4/4] xtf-runner: regularise runner exit code
On 22/07/16 10:58, Wei Liu wrote: Updated this patch Thanks. Pylint warned of an unnecessary semicolon from "res_idx = all_results.index(res);" which I dropped. Whole series Reviewed and committed. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 12:55 PM, Razvan Cojocaru wrote: On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved. I wouldn't think that was the rationale (considering that the risk is crashing the hypervisor as a whole; also see below), I'd rather think this simply went unnoticed. Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5). And even if an introspection application behaves as sanely as you say, the current layout of the code still doesn't guarantee bug-free behavior. That's because for one, there are in-guest events (that trigger access of vm-events resources) that are completely asynchronous in relation to 2-6, for example a HVMOP_guest_request_vm_event (subsequently calling monitor_guest_request()). As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). I have nothing against this. Having as many assurances as possible that things will work is definitely a plus in my book - with the comment that I would prefer a rwlock to an ordinary spinlock Yep, I'd prefer that too. , and that "subsys_lock" sounds obscure to me, although I admit that I can't think of a good name at the moment. I'm referring to monitor, share & paging as "subsystems" of the whole "vm-events" subsystem (in that sense the 3 aforementioned are "sub-subsystems"). To me it's acceptable, but I'm open to better names. Thanks, Razvan Thanks, Corneliu. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
Hi Julien, On 07/22/2016 11:30 AM, Julien Grall wrote: > > > On 22/07/16 09:54, Sergej Proskurin wrote: >> Hi Julien, > > Hello Sergej, > >> On 07/20/2016 06:11 PM, Julien Grall wrote: >>> The function flush_tlb_domain is not used outside of the file where it >>> has been declared. >>> >> >> As for patch #15, the same applies here too: >> For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to >> ./xen/arch/arm/altp2m.c. > > Based on your previous version, I don't see any reason to flush call > flush_tlb_domain/p2m_flush_tlb in altp2m. > > Please justify why you would need it. > The new version considers changes that are made to the hostp2m and propagates them to all affected altp2m views by either changing individual altp2m entries or even flushing (but not destroying) the entire altp2m-tables. This idea has been borrowed from the x86 altp2m implementation. To prevent access to old/invalid GPAs, the current implementation flushes the TLBs associated with the affected altp2m view after such propagation. Best regards, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
On 22.07.2016 11:03, Ingo Jürgensmann wrote: In the meanwhile, I activated IPv6 again on Tuesday evening and today the server crashed again some minutes ago. Here's the output from netconsole: ... and the second subsequent crash: Jul 22 11:15:04 31.172.31.251 [ 2774.896036] BUG: unable to handle kernel Jul 22 11:15:04 31.172.31.251 at 88026ee6c000 Jul 22 11:15:04 31.172.31.251 [ 2774.896100] IP: Jul 22 11:15:04 31.172.31.251 [] memcpy+0x6/0x110 Jul 22 11:15:04 31.172.31.251 [ 2774.896127] PGD 1814067 Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.896164] Oops: [#1] Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.896189] Modules linked in: [omitting kernel modules list] Jul 22 11:15:04 31.172.31.251 [ 2774.897396] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3 Jul 22 11:15:04 31.172.31.251 [ 2774.897453] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0a 01/03/2014 Jul 22 11:15:04 31.172.31.251 [ 2774.897514] task: 8181a460 ti: 8180 task.ti: 8180 Jul 22 11:15:04 31.172.31.251 [ 2774.897567] RIP: e030:[] Jul 22 11:15:04 31.172.31.251 [] memcpy+0x6/0x110 Jul 22 11:15:04 31.172.31.251 [ 2774.897625] RSP: e02b:880280e03b58 EFLAGS: 00010286 Jul 22 11:15:04 31.172.31.251 [ 2774.897658] RAX: 880264550070 RBX: 88026cbe2400 RCX: 02ee Jul 22 11:15:04 31.172.31.251 [ 2774.897694] RDX: 04a0 RSI: 88026ee6c000 RDI: 880264550222 Jul 22 11:15:04 31.172.31.251 [ 2774.897729] RBP: 880280e03c38 R08: 06c0 R09: 880264550062 Jul 22 11:15:04 31.172.31.251 [ 2774.897765] R10: 0100 R11: 8c9ab7de R12: 880250204200 Jul 22 11:15:04 31.172.31.251 [ 2774.897800] R13: 88026ee6be66 R14: 88026f2a7040 R15: 04a8 Jul 22 11:15:04 31.172.31.251 [ 2774.897842] FS: () GS:880280e0() knlGS:880280e0 Jul 22 11:15:04 31.172.31.251 [ 2774.897896] CS: e033 DS: ES: CR0: 80050033 Jul 22 11:15:04 31.172.31.251 [ 2774.897929] CR2: 88026ee6c000 CR3: 0002645c3000 CR4: 00042660 Jul 22 11:15:04 31.172.31.251 [ 2774.897965] Stack: Jul 22 11:15:04 31.172.31.251 [ 2774.897991] 814d319f Jul 22 11:15:04 31.172.31.251 88026a031000 Jul 22 11:15:04 31.172.31.251 880280e03ba8 Jul 22 11:15:04 31.172.31.251 9401294600a7012a Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.898063] 0100 Jul 22 11:15:04 31.172.31.251 814a000a Jul 22 11:15:04 31.172.31.251 6fda7e90 Jul 22 11:15:04 31.172.31.251 80fe Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.898133] 1ad902feff7ac40e Jul 22 11:15:04 31.172.31.251 88026fa5c6c0 Jul 22 11:15:04 31.172.31.251 224afc3e1600 Jul 22 11:15:04 31.172.31.251 880250204200 Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.898205] Call Trace: Jul 22 11:15:04 31.172.31.251 [ 2774.898233] Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.898242] Jul 22 11:15:04 31.172.31.251 [] ? ndisc_send_redirect+0x3bf/0x410 Jul 22 11:15:04 31.172.31.251 [ 2774.898313] [] ? reg_vif_xmit+0xca/0xf0 Jul 22 11:15:04 31.172.31.251 [ 2774.898351] [] ? ip6_forward+0x71c/0x850 Jul 22 11:15:04 31.172.31.251 [ 2774.898395] [] ? ip6_route_input+0xa4/0xd0 Jul 22 11:15:04 31.172.31.251 [ 2774.898432] [] ? __netif_receive_skb_core+0x543/0x750 Jul 22 11:15:04 31.172.31.251 [ 2774.898468] [] ? netif_receive_skb_internal+0x1f/0x80 Jul 22 11:15:04 31.172.31.251 [ 2774.898513] [] ? br_handle_frame_finish+0x1c2/0x3c0 [bridge] Jul 22 11:15:04 31.172.31.251 [ 2774.898571] [] ? br_handle_frame+0x147/0x240 [bridge] Jul 22 11:15:04 31.172.31.251 [ 2774.898706] [] ? __netif_receive_skb_core+0x1c4/0x750 Jul 22 11:15:04 31.172.31.251 [ 2774.898747] [] ? xen_clocksource_get_cycles+0x1c/0x20 Jul 22 11:15:04 31.172.31.251 [ 2774.898784] [] ? netif_receive_skb_internal+0x1f/0x80 Jul 22 11:15:04 31.172.31.251 [ 2774.898822] [] ? xenvif_tx_action+0x49f/0x920 [xen_netback] Jul 22 11:15:04 31.172.31.251 [ 2774.898877] [] ? xenvif_poll+0x28/0x70 [xen_netback] Jul 22 11:15:04 31.172.31.251 [ 2774.898914] [] ? net_rx_action+0x140/0x240 Jul 22 11:15:04 31.172.31.251 [ 2774.898950] [] ? __do_softirq+0xf1/0x290 Jul 22 11:15:04 31.172.31.251 [ 2774.898985] [] ? irq_exit+0x95/0xa0 Jul 22 11:15:04 31.172.31.251 [ 2774.899022] [] ? xen_evtchn_do_upcall+0x35/0x50 Jul 22 11:15:04 31.172.31.251 [ 2774.899059] [] ? xen_do_hypervisor_callback+0x1e/0x30 Jul 22 11:15:04 31.172.31.251 [ 2774.899093] Jul 22 11:15:04 31.172.31.251 Jul 22 11:15:04 31.172.31.251 [ 2774.899103] Jul 22 11:15:04 31.172.31.251 [] ? xen_hypercall_sched_op+0xa/0x20 Jul 22 11:15:04 31.172.31.251 [ 2774.899168] [] ? xen_hypercall_sched_op+0xa/0x20 Jul 22 11:15:04 31.172.31.251 [ 2774.
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 22/07/16 11:16, Sergej Proskurin wrote: Hi Julien, Hello, On 07/22/2016 11:18 AM, Julien Grall wrote: On 22/07/16 09:32, Sergej Proskurin wrote: Hi Julien, Hello Sergej, -int p2m_alloc_table(struct domain *d) +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. No, this function should not be called outside p2m.c as it will not fully initialize the p2m. You need to need to provide a function to initialize a p2m (such as p2m_init). The last time we have discussed reusing existing code, among others, for individual struct p2m_domain initialization routines. Also, we have agreed to move altp2m-related parts out of p2m.c into altp2m.c, which makes it hard not to access parts required for initialization/teardown (that are equal for both p2m and altp2m). I remember this discussion. However, the p2m initialize/teardown should exactly be the same for the hostp2m and altp2m (except for the type of the p2m). So, a function should be provided to initialize a full p2m to avoid code duplication. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 07/22/2016 01:17 PM, Corneliu ZUZU wrote: > On 7/22/2016 12:55 PM, Razvan Cojocaru wrote: >> On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: >>> Hi, >>> >>> I've been inspecting vm-event code parts to try and understand when and >>> why domain pausing/locking is done. If I understood correctly, domain >>> pausing is done solely to force all the vCPUs of that domain to see a >>> configuration update and act upon it (e.g. in the case of a >>> XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, >>> domain pausing/unpausing ensures immediate enabling of CR3 load-exiting >>> for all vCPUs), not to synchronize concurrent operations >>> (lock-behavior). >>> >>> As for locking, I see that for example vm_event_init_domain(), >>> vm_event_cleanup_domain() and monitor_domctl() are all protected by the >>> domctl-lock, but I don't think that's enough. >>> Here are a few code-paths that led me to believe that: >>> * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* >>> resources, but it doesn't seem to take the domctl lock, so it seems >>> possible for that to happen _while_ those resources are >>> initialized/cleaned-up >>> * monitor_guest_request() also calls >>> monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() >>> which attempts a vm_event_ring_lock(ved), which could also be called >>> _while_ that's initialized (vm_event_enable()) or cleaned-up >>> (vm_event_disable()) >>> * hvm_monitor_cr() - e.g. on the code-path >>> vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() >>> >>> there doesn't seem to be taken into account the possibility of a >>> concurrent monitor_init_domain()/monitor_cleanup_domain() >>> >>> Am I missing something with these conclusions? >> Your conclusions look correct, but I assume that the reason why this has >> not been addressed in the past is that introspection applications are >> expected to be well-behaved. > > I wouldn't think that was the rationale (considering that the risk is > crashing the hypervisor as a whole; also see below), I'd rather think > this simply went unnoticed. > >> Specifically, in a codebase where the >> choice between uint64_t and long int matters speed-wise, and where >> unlikely()s also matter, an extra lock may be an issue. >> >> The typical flow of an introspection application is: >> >> 1. Initialize everything. >> 2. Subscribe to relevant events. >> 3. Event processing loop. >> 4. Unsubscribe from events. >> 5. Do a last-run of event processing (already queued in the ring buffer). >> 6. Uninitialize everything (no events are possible here because of steps >> 4-5). > > And even if an introspection application behaves as sanely as you say, > the current layout of the code still doesn't guarantee bug-free > behavior. That's because for one, there are in-guest events (that > trigger access of vm-events resources) that are completely asynchronous > in relation to 2-6, for example a HVMOP_guest_request_vm_event > (subsequently calling monitor_guest_request()). Not true. As part of step 4 d->monitor.guest_request_enabled will become 0, so no sensitive code will run after that: 129 void monitor_guest_request(void) 130 { 131 struct vcpu *curr = current; 132 struct domain *d = curr->domain; 133 134 if ( d->monitor.guest_request_enabled ) 135 { 136 vm_event_request_t req = { 137 .reason = VM_EVENT_REASON_GUEST_REQUEST, 138 .vcpu_id = curr->vcpu_id, 139 }; 140 141 monitor_traps(curr, d->monitor.guest_request_sync, &req); 142 } 143 } The guest _may_ try to request an event after that, but the request will be ignored. >> , and that "subsys_lock" >> sounds obscure to me, although I admit that I can't think of a good name >> at the moment. > > I'm referring to monitor, share & paging as "subsystems" of the whole > "vm-events" subsystem (in that sense the 3 aforementioned are > "sub-subsystems"). > To me it's acceptable, but I'm open to better names. If Tamas agrees with you I'll consider myself outvoted. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 12:51 PM, Andrew Cooper wrote: On 22/07/16 10:27, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). Correct. Without the vcpu/domain pause, a vcpu could be midway through a vmexit path with the monitor configuration changing under its feet. OTOH, it could be running in guest context, and only receive the update one scheduling timeslice later. As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). I don't think you are missing anything. It seems like a monitor lock is needed. FWIW, the domctl lock is only used at the moment because it was easy, and worked when everything was a domctl. Being a global spinlock, it is a severe bottlekneck for concurrent domain management, which I need to find some time to break apart, so the less reliance on it the better from my point of view. ~Andrew I was thinking of not touching the way the domctl lock is acquired in these cases at all, but rather leave that as it is and introduce the rw-lock separately. Are you suggesting I should also change do_domctl to skip domctl_lock_acquire()-ing if op->cmd is XEN_DOMCTL_vm_event_op ? That would certainly be a first step towards amending that bottleneck you mentioned, but it would break consistency of locking-behavior for domctls. I think it would also add significant complexity to what I'll be doing so I hope it wouldn't be a problem if we leave that for a future patch. Corneliu. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
On 22/07/16 11:25, Sergej Proskurin wrote: Hi Julien, On 07/22/2016 11:30 AM, Julien Grall wrote: On 22/07/16 09:54, Sergej Proskurin wrote: Hi Julien, Hello Sergej, On 07/20/2016 06:11 PM, Julien Grall wrote: The function flush_tlb_domain is not used outside of the file where it has been declared. As for patch #15, the same applies here too: For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to ./xen/arch/arm/altp2m.c. Based on your previous version, I don't see any reason to flush call flush_tlb_domain/p2m_flush_tlb in altp2m. Please justify why you would need it. The new version considers changes that are made to the hostp2m and propagates them to all affected altp2m views by either changing individual altp2m entries or even flushing (but not destroying) the entire altp2m-tables. This idea has been borrowed from the x86 altp2m implementation. To prevent access to old/invalid GPAs, the current implementation flushes the TLBs associated with the affected altp2m view after such propagation. There is already a flush in apply_p2m_changes and removing all the mapping in a p2m could be implemented in p2m.c. So I still don't see why you need the flush outside. I looked at the x86 version of the propagation and I was not able to spot any explicit flush. Maybe you can provide some code to show what you mean. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 07/22/2016 12:26 PM, Julien Grall wrote: > > > On 22/07/16 11:16, Sergej Proskurin wrote: >> Hi Julien, > > Hello, > >> On 07/22/2016 11:18 AM, Julien Grall wrote: >>> >>> >>> On 22/07/16 09:32, Sergej Proskurin wrote: Hi Julien, >>> >>> Hello Sergej, >>> > -int p2m_alloc_table(struct domain *d) > +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. >>> >>> No, this function should not be called outside p2m.c as it will not >>> fully initialize the p2m. You need to need to provide a function to >>> initialize a p2m (such as p2m_init). >>> >> >> The last time we have discussed reusing existing code, among others, for >> individual struct p2m_domain initialization routines. Also, we have >> agreed to move altp2m-related parts out of p2m.c into altp2m.c, which >> makes it hard not to access parts required for initialization/teardown >> (that are equal for both p2m and altp2m). > > I remember this discussion. However, the p2m initialize/teardown should > exactly be the same for the hostp2m and altp2m (except for the type of > the p2m). So, a function should be provided to initialize a full p2m to > avoid code duplication. > This is exactly what has been done. Nevertheless, altp2m views are somewhat more dynamic than hostp2m and hence require frequent initialization/teardown of individual views. That is, by moving altp2m parts out of p2m.c we simply need to access this shared code multiple times at runtime from different altp2m-related functions. This applies to more functions that just to p2m_alloc_table. Best regards, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/1] ratelimit: Implement rate limit for credit2 scheduler Rate limit assures that a vcpu will execute for a minimum amount of time before being put at the back of a queue or
On Mon, 2016-07-18 at 13:22 +0100, Anshul Makkar wrote: > Hey, Anshul. Thanks, and sorry for the delay in reviewing. This version is an improvement, but it looks to me that you've missed a few of the review comments to v1. > It introduces a minimum amount of latency > "introduces context-switch rate-limiting" > to enable a VM to batch its work and > it also ensures that system is not spending most of its time in > VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate. > > ratelimit can be disabled by setting it to 0. > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 8b95a47..68bcdb8 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > > @@ -1601,6 +1602,34 @@ csched2_dom_cntl( > return rc; > } > > +static int csched2_sys_cntl(const struct scheduler *ops, > +struct xen_sysctl_scheduler_op *sc) > +{ > +int rc = -EINVAL; > +xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit; > +struct csched2_private *prv = CSCHED2_PRIV(ops); > +unsigned long flags; > + > +switch (sc->cmd ) > +{ > +case XEN_SYSCTL_SCHEDOP_putinfo: > +if ( params->ratelimit_us && > +( params->ratelimit_us < CSCHED2_MIN_TIMER || > + params->ratelimit_us > > MICROSECS(CSCHED2_MAX_TIMER) )) > CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are defined as follows: #define CSCHED2_MIN_TIMERMICROSECS(500) #define CSCHED2_MAX_TIMERMILLISECS(2) Which basically means they're value is 500*1000=50 and 2*100=200. ratelimit_us is specified assuming it will be in microseconds. So, basically, you're forcing the value stored in ratelimit_us to be at least 50, which means 50 microseconds, which means 500 milliseconds, which is not what we want! I remember saying already (although, it may have be in pvt, not on this list) that I think we should just use XEN_SYSCTL_SCHED_RATELIMIT_MAX and XEN_SYSCTL_SCHED_RATELIMIT_MIN here. CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are internal implementation details, and I don't like them exposed (although, indirectly) to the user. > +return rc; > +spin_lock_irqsave(&prv->lock, flags); > +prv->ratelimit_us = params->ratelimit_us; > +spin_unlock_irqrestore(&prv->lock, flags); > +break; > + This is ok. However, the code base changed in the meanwhile (sorry! :- P), and this spin_lock_irqsave() needs to become a write_lock_irqsave(). > @@ -1688,9 +1719,20 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext > * 1) Run until snext's credit will be 0 > * 2) But if someone is waiting, run until snext's credit is > equal > * to his > - * 3) But never run longer than MAX_TIMER or shorter than > MIN_TIMER. > + * 3) But never run longer than MAX_TIMER or shorter than > MIN_TIMER or > + * run for ratelimit time. > */ > I prefer George's version of this comment change: "3) But never run longer than MAX_TIMER or shorter than MIN_TIMER or the ratelimit time." > > +/* Calculate mintime */ > +min_time = CSCHED2_MIN_TIMER; > +if ( prv->ratelimit_us ) { > Coding style. (Parenthesis goes on next line.) > +s_time_t ratelimit_min = prv->ratelimit_us; > +ratelimit_min = snext->vcpu->runstate.state_entry_time + > +MICROSECS(prv->ratelimit_us) - now; > Mmm... if you wanted to implement my suggestion from <1468400021.13039.33.ca...@citrix.com>, you're definitely missing something: s_time_t ratelimit_min = prv->ratelimit_us; if ( snext->vcpu->is_running ) ratelimit_min = snext->vcpu->runstate.state_entry_time + MICROSECS(prv->ratelimit_us) - now; In fact, you're initializing ratelimit_min and then immediately overriding that... I'm surprised the compiler didn't complain. > +if ( ratelimit_min > min_time ) > +min_time = ratelimit_min; > +} > + > @@ -1707,32 +1749,33 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext > } > } > > -/* The next guy may actually have a higher credit, if we've > tried to > - * avoid migrating him from a different cpu. DTRT. */ > -if ( rt_credit <= 0 ) > +/* > + * The next guy ont the runqueue may actually have a higher > credit, > + * if we've tried to avoid migrating him from a different cpu. > + * Setting time=0 will ensure the minimum timeslice is chosen. > George's draft patch had an empty line within this comment in here, separating the two paragraph. Can we keep it? (I know, this is a very minor thing, but since we're here :-D) > + * FIXME: See if we can eliminate this conversion if we know > time > + * will be outside (MIN,MAX). Probably requires pre-calculating > + * credit values of MIN,MAX per vcpu, since each vcpu burns > credit > + * at a diffe
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 22/07/16 11:39, Sergej Proskurin wrote: On 07/22/2016 12:26 PM, Julien Grall wrote: On 22/07/16 11:16, Sergej Proskurin wrote: Hi Julien, Hello, On 07/22/2016 11:18 AM, Julien Grall wrote: On 22/07/16 09:32, Sergej Proskurin wrote: Hi Julien, Hello Sergej, -int p2m_alloc_table(struct domain *d) +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. No, this function should not be called outside p2m.c as it will not fully initialize the p2m. You need to need to provide a function to initialize a p2m (such as p2m_init). The last time we have discussed reusing existing code, among others, for individual struct p2m_domain initialization routines. Also, we have agreed to move altp2m-related parts out of p2m.c into altp2m.c, which makes it hard not to access parts required for initialization/teardown (that are equal for both p2m and altp2m). I remember this discussion. However, the p2m initialize/teardown should exactly be the same for the hostp2m and altp2m (except for the type of the p2m). So, a function should be provided to initialize a full p2m to avoid code duplication. This is exactly what has been done. Nevertheless, altp2m views are somewhat more dynamic than hostp2m and hence require frequent initialization/teardown of individual views. That is, by moving altp2m parts out of p2m.c we simply need to access this shared code multiple times at runtime from different altp2m-related functions. This applies to more functions that just to p2m_alloc_table. I am not convinced that you need to reallocate the root page every time rather than clearing them. Anyway, I will need to see the code to understand what is done. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] OVMF very slow on AMD
On Mon, 2016-07-18 at 16:09 +0100, Anthony PERARD wrote: > > $ dwdiff procinfo_guest_ovmf_kvm procinfo_guest_ovmf_xen > processor : 0 > vendor_id : AuthenticAMD > cpu family: [-6-] {+21+} > model : [-6-] {+1+} > model name: [-QEMU Virtual CPU version 2.5+-] {+AMD > Opteron(tm) Processor 4284+} > stepping : [-3-] {+2+} > microcode : [-0x165-] {+0x600063d+} > Is it ok / expected for this (microcode, I mean) to be different? BTW, I'm super ignorant about all it's being discussed here, so this can well be completely off! Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
On 07/22/2016 12:34 PM, Julien Grall wrote: > > > On 22/07/16 11:25, Sergej Proskurin wrote: >> Hi Julien, >> >> On 07/22/2016 11:30 AM, Julien Grall wrote: >>> >>> >>> On 22/07/16 09:54, Sergej Proskurin wrote: Hi Julien, >>> >>> Hello Sergej, >>> On 07/20/2016 06:11 PM, Julien Grall wrote: > The function flush_tlb_domain is not used outside of the file where it > has been declared. > As for patch #15, the same applies here too: For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to ./xen/arch/arm/altp2m.c. >>> >>> Based on your previous version, I don't see any reason to flush call >>> flush_tlb_domain/p2m_flush_tlb in altp2m. >>> >>> Please justify why you would need it. >>> >> >> The new version considers changes that are made to the hostp2m and >> propagates them to all affected altp2m views by either changing >> individual altp2m entries or even flushing (but not destroying) the >> entire altp2m-tables. This idea has been borrowed from the x86 altp2m >> implementation. >> >> To prevent access to old/invalid GPAs, the current implementation >> flushes the TLBs associated with the affected altp2m view after such >> propagation. > > There is already a flush in apply_p2m_changes and removing all the > mapping in a p2m could be implemented in p2m.c. So I still don't see why > you need the flush outside. > Yes, the flush you are referring to flushes the hostp2m - not the individual altp2m views. > I looked at the x86 version of the propagation and I was not able to > spot any explicit flush. Maybe you can provide some code to show what > you mean. > Sure thing: ... static void p2m_reset_altp2m(struct p2m_domain *p2m) { p2m_flush_table(p2m); /* Uninit and reinit ept to force TLB shootdown */ ept_p2m_uninit(p2m); ept_p2m_init(p2m); p2m->min_remapped_gfn = INVALID_GFN; p2m->max_remapped_gfn = 0; } ... On x86, the uninit- and re-initialization of the EPTs force the TLBs associated with the configured VMID of the EPTs to flush. Regards, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 97737: regressions - FAIL
On Fri, Jul 22, 2016 at 03:27:30AM +, osstest service owner wrote: > flight 97737 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/97737/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-credit2 19 guest-start/debian.repeat fail REGR. vs. > 97664 > test-armhf-armhf-xl 15 guest-start/debian.repeat fail REGR. vs. > 97664 > test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. > 97664 From http://logs.test-lab.xenproject.org/osstest/logs/97737/test-amd64-amd64-xl-credit2/serial-fiano0.log Jul 21 07:38:22.383917 (XEN) Assertion 'rqd->avgload >= 0 && rqd->b_avgload >= 0' failed at sched_credit2.c:734 Jul 21 07:38:23.27 (XEN) [ Xen-4.8-unstable x86_64 debug=y Not tainted ] Jul 21 07:38:23.207896 (XEN) CPU:2 Jul 21 07:38:23.207943 (XEN) RIP:e008:[] sched_credit2.c#__update_runq_load+0xde/0x11e Jul 21 07:38:23.215908 (XEN) RFLAGS: 00010082 CONTEXT: hypervisor Jul 21 07:38:23.215970 (XEN) rax: 502c rbx: 502c rcx: a1df Jul 21 07:38:23.223906 (XEN) rdx: 0001 rsi: 83027d800460 rdi: 38f6e47e Jul 21 07:38:23.231882 (XEN) rbp: 830277eb7c10 rsp: 830277eb7be0 r8: fd71 Jul 21 07:38:23.239900 (XEN) r9: 0012 r10: 0014 r11: 022d Jul 21 07:38:23.247906 (XEN) r12: 830277d8f900 r13: 0001 r14: 00e3db91fb86 Jul 21 07:38:23.255909 (XEN) r15: 83027d800460 cr0: 8005003b cr4: 001526e0 Jul 21 07:38:23.255974 (XEN) cr3: 7dcfd000 cr2: 7f81409f1b20 Jul 21 07:38:23.263910 (XEN) ds: 002b es: 002b fs: gs: ss: e010 cs: e008 Jul 21 07:38:23.272145 (XEN) Xen code around (sched_credit2.c#__update_runq_load+0xde/0x11e): Jul 21 07:38:23.279948 (XEN) 00 00 00 48 85 c9 79 02 <0f> 0b 83 3d 44 2e 1c 00 00 74 2e 8b 36 40 88 75 Jul 21 07:38:23.287896 (XEN) Xen stack trace from rsp=830277eb7be0: Jul 21 07:38:23.287960 (XEN)0297 830277eb7c00 82d080130969 0026785f Jul 21 07:38:23.295904 (XEN)830277eb7c60 82d0802e96c0 830277eb7c50 82d080126b19 Jul 21 07:38:23.303909 (XEN)0297 830277d8f900 00e3db91fb86 Jul 21 07:38:23.311948 (XEN)82d0802e96c0 83027d800464 830277eb7c80 82d08012956e Jul 21 07:38:23.319895 (XEN)83007dafc000 00e3db91f9b6 82d080344140 83027d800464 Jul 21 07:38:23.319969 (XEN)830277eb7ce0 82d08012de99 0297 0046 Jul 21 07:38:23.327929 (XEN)82d080130969 00266c7b 830277eb7d10 83007dafc000 Jul 21 07:38:23.335903 (XEN) 830277ebd000 83027d8450b8 82c00021001c Jul 21 07:38:23.343911 (XEN)830277eb7cf0 82d08012e393 830277eb7d10 82d08016b7d4 Jul 21 07:38:23.351904 (XEN)0007 002e 830277eb7d20 82d08016b844 Jul 21 07:38:23.359911 (XEN)830277eb7d90 82d08010ab2f 82c0002100b8 072e Jul 21 07:38:23.367896 (XEN)83027d845010 0086 83007dafc000 0007 Jul 21 07:38:23.367966 (XEN)830277eb7d78 0003 830277ebd000 83007dafc19c Jul 21 07:38:23.375919 (XEN)0286 83007dafc000 830277eb7dd0 82d08010847b Jul 21 07:38:23.383904 (XEN) 83007dc9d000 8302630b3aa8 Jul 21 07:38:23.391904 (XEN)8302630b3000 830277da4a28 830277eb7e00 82d080105c7a Jul 21 07:38:23.399910 (XEN)830277eba040 830277eb7fff Jul 21 07:38:23.407908 (XEN)830277eb7e30 82d080122a08 82d08031aa80 82d08031ab80 Jul 21 07:38:23.407977 (XEN)82d08031aa80 fffd 830277eb7e60 82d080130116 Jul 21 07:38:23.415913 (XEN)0002 830277da4970 0004 Jul 21 07:38:23.423908 (XEN) Xen call trace: Jul 21 07:38:23.423964 (XEN)[] sched_credit2.c#__update_runq_load+0xde/0x11e Jul 21 07:38:23.431916 (XEN)[] sched_credit2.c#update_load+0x53/0x78 Jul 21 07:38:23.439900 (XEN)[] sched_credit2.c#csched2_vcpu_wake+0xf7/0x119 Jul 21 07:38:23.447913 (XEN)[] vcpu_wake+0x23f/0x3cb Jul 21 07:38:23.447979 (XEN)[] vcpu_unblock+0x4b/0x4e Jul 21 07:38:23.455908 (XEN)[] vcpu_kick+0x17/0x5b Jul 21 07:38:23.463889 (XEN)[] vcpu_mark_events_pending+0x2c/0x2f Jul 21 07:38:23.463962 (XEN)[] event_fifo.c#evtchn_fifo_set_pending+0x36a/0x3de Jul 21 07:38:23.471914 (XEN)[] send_global_virq+0xe7/0x117 Jul 21 07:38:23.479916 (XEN)[] domain.c#complete_domain_destroy+0x179/0x182 Jul 21 07:38:23.487875 (XEN)[] rcupdate.c#rcu_process_callbacks+0x141/0x1a2 Jul 21 07:38:23.487915 (XEN)[] softirq.c#__d
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
On 22/07/16 11:46, Sergej Proskurin wrote: On 07/22/2016 12:34 PM, Julien Grall wrote: On 22/07/16 11:25, Sergej Proskurin wrote: Hi Julien, On 07/22/2016 11:30 AM, Julien Grall wrote: On 22/07/16 09:54, Sergej Proskurin wrote: Hi Julien, Hello Sergej, On 07/20/2016 06:11 PM, Julien Grall wrote: The function flush_tlb_domain is not used outside of the file where it has been declared. As for patch #15, the same applies here too: For altp2m, flush_tlb_domain/p2m_flush_tlb should be made available to ./xen/arch/arm/altp2m.c. Based on your previous version, I don't see any reason to flush call flush_tlb_domain/p2m_flush_tlb in altp2m. Please justify why you would need it. The new version considers changes that are made to the hostp2m and propagates them to all affected altp2m views by either changing individual altp2m entries or even flushing (but not destroying) the entire altp2m-tables. This idea has been borrowed from the x86 altp2m implementation. To prevent access to old/invalid GPAs, the current implementation flushes the TLBs associated with the affected altp2m view after such propagation. There is already a flush in apply_p2m_changes and removing all the mapping in a p2m could be implemented in p2m.c. So I still don't see why you need the flush outside. Yes, the flush you are referring to flushes the hostp2m - not the individual altp2m views. apply_p2m_changes is *not* hostp2m specific. It should work on any p2m regardless the type. The ARM P2M interface is not set in stone, so if it does not fit it will need to be changed. We should avoid to hack the code in order to add a new feature. It might be time to mention that I am reworking the whole p2m code it does not respect the ARM spec (such as break-before-make semantics) and I believe it does not fit the altp2m model. It is very difficult to implement the former with the current implementation and without have a big performance impact. Rather than having a function which implement all the operations, I am planning to have a simple set of functions that can be used to re-implement the operations: - p2m_set_entry: Set an entry in the P2M - p2m_get_entry: Retrieve the informations of an entry This is very similar to x86 and make more straight forward to implement new operations and co-op with the ARM spec. I have already a prototype and I am hoping to send it soon. I looked at the x86 version of the propagation and I was not able to spot any explicit flush. Maybe you can provide some code to show what you mean. Sure thing: ... static void p2m_reset_altp2m(struct p2m_domain *p2m) { p2m_flush_table(p2m); /* Uninit and reinit ept to force TLB shootdown */ ept_p2m_uninit(p2m); ept_p2m_init(p2m); p2m->min_remapped_gfn = INVALID_GFN; p2m->max_remapped_gfn = 0; } ... On x86, the uninit- and re-initialization of the EPTs force the TLBs associated with the configured VMID of the EPTs to flush. As mentioned in my previous mail, p2m_reset can be implemented in p2m.c as this is not altp2m.c specific. When I asked to move altp2m specific code from p2m.c to altp2m.c it was for avoiding to have p2m.c too big. However, if the function is not altp2m specific, there is little reason to move outside. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 07/22/2016 12:38 PM, Julien Grall wrote: > > > On 22/07/16 11:39, Sergej Proskurin wrote: >> >> >> On 07/22/2016 12:26 PM, Julien Grall wrote: >>> >>> >>> On 22/07/16 11:16, Sergej Proskurin wrote: Hi Julien, >>> >>> Hello, >>> On 07/22/2016 11:18 AM, Julien Grall wrote: > > > On 22/07/16 09:32, Sergej Proskurin wrote: >> Hi Julien, > > Hello Sergej, > >>> -int p2m_alloc_table(struct domain *d) >>> +static int p2m_alloc_table(struct domain *d) >> >> While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, >> the >> function p2m_alloc_table needs to be called from >> ./xen/arch/arm/altp2m.c >> to allocate the individual altp2m views. Hence it should not be >> static. > > No, this function should not be called outside p2m.c as it will not > fully initialize the p2m. You need to need to provide a function to > initialize a p2m (such as p2m_init). > The last time we have discussed reusing existing code, among others, for individual struct p2m_domain initialization routines. Also, we have agreed to move altp2m-related parts out of p2m.c into altp2m.c, which makes it hard not to access parts required for initialization/teardown (that are equal for both p2m and altp2m). >>> >>> I remember this discussion. However, the p2m initialize/teardown should >>> exactly be the same for the hostp2m and altp2m (except for the type of >>> the p2m). So, a function should be provided to initialize a full p2m to >>> avoid code duplication. >>> >> >> This is exactly what has been done. Nevertheless, altp2m views are >> somewhat more dynamic than hostp2m and hence require frequent >> initialization/teardown of individual views. That is, by moving altp2m >> parts out of p2m.c we simply need to access this shared code multiple >> times at runtime from different altp2m-related functions. This applies >> to more functions that just to p2m_alloc_table. > > I am not convinced that you need to reallocate the root page every time > rather than clearing them. Anyway, I will need to see the code to > understand what is done. > > Regards, > In the following, you will find an excerpt of both the p2m.c and altp2m.c concerning the (alt)p2m initialization. Please note that altp2m_init_helper is called from different initialization routines in altp2m.c. Also, please consider that this code has not yet been ported to your recent patches. Please let me know if you need more information. --- ./xen/arch/p2m.c: ... int p2m_alloc_table(struct p2m_domain *p2m) { unsigned int i; struct page_info *page; struct vttbr *vttbr = &p2m->vttbr; page = alloc_domheap_pages(NULL, P2M_ROOT_ORDER, 0); if ( page == NULL ) return -ENOMEM; /* Clear all first level pages */ for ( i = 0; i < P2M_ROOT_PAGES; i++ ) clear_and_clean_page(page + i); p2m->root = page; /* Initialize the VTTBR associated with the allocated p2m table. */ vttbr->vttbr = 0; vttbr->vmid = p2m->vmid & 0xff; vttbr->baddr = page_to_maddr(p2m->root); return 0; } int p2m_init_one(struct domain *d, struct p2m_domain *p2m) { int rc = 0; spin_lock_init(&p2m->lock); INIT_PAGE_LIST_HEAD(&p2m->pages); spin_lock(&p2m->lock); p2m->vmid = INVALID_VMID; rc = p2m_alloc_vmid(d, p2m); if ( rc != 0 ) goto err; p2m->domain = d; p2m->access_required = false; p2m->mem_access_enabled = false; p2m->default_access = p2m_access_rwx; p2m->root = NULL; p2m->vttbr.vttbr = INVALID_VTTBR; p2m->max_mapped_gfn = 0; p2m->lowest_mapped_gfn = INVALID_GFN; radix_tree_init(&p2m->mem_access_settings); err: spin_unlock(&p2m->lock); return rc; } static int p2m_init_hostp2m(struct domain *d) { struct p2m_domain *p2m = p2m_get_hostp2m(d); d->arch.vttbr = INVALID_VTTBR; p2m->p2m_class = p2m_host; return p2m_init_one(d, p2m); } int p2m_init(struct domain *d) { int rc; rc = p2m_init_hostp2m(d); if ( rc ) return rc; return altp2m_init(d); } ... --- ./xen/arch/altp2m.c: ... static int altp2m_init_helper(struct domain *d, unsigned int idx) { int rc; struct p2m_domain *p2m = d->arch.altp2m_p2m[idx]; if ( p2m == NULL ) { /* Allocate a new altp2m view. */ p2m = xzalloc(struct p2m_domain); if ( p2m == NULL) { rc = -ENOMEM; goto err; } memset(p2m, 0, sizeof(struct p2m_domain)); } /* Initialize the new altp2m view. */ rc = p2m_init_one(d, p2m); if ( rc ) goto err; /* Allocate a root table for the altp2m view. */ rc = p2m_alloc_table(p2m); if ( rc ) goto err; p2m->p2m_class = p2m_alternate; p2m->access_required = 1; _atomic_set(&p2m->active_vcpus, 0); d->arch.altp2m_p2m[idx] = p2m; d->arch.altp2m_vttbr[idx] = p2m->vttbr.vttbr; /* * M
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 22/07/16 11:31, Corneliu ZUZU wrote: On 7/22/2016 12:51 PM, Andrew Cooper wrote: On 22/07/16 10:27, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). Correct. Without the vcpu/domain pause, a vcpu could be midway through a vmexit path with the monitor configuration changing under its feet. OTOH, it could be running in guest context, and only receive the update one scheduling timeslice later. As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). I don't think you are missing anything. It seems like a monitor lock is needed. FWIW, the domctl lock is only used at the moment because it was easy, and worked when everything was a domctl. Being a global spinlock, it is a severe bottlekneck for concurrent domain management, which I need to find some time to break apart, so the less reliance on it the better from my point of view. ~Andrew I was thinking of not touching the way the domctl lock is acquired in these cases at all, but rather leave that as it is and introduce the rw-lock separately. Are you suggesting I should also change do_domctl to skip domctl_lock_acquire()-ing if op->cmd is XEN_DOMCTL_vm_event_op ? That would certainly be a first step towards amending that bottleneck you mentioned, but it would break consistency of locking-behavior for domctls. I think it would also add significant complexity to what I'll be doing so I hope it wouldn't be a problem if we leave that for a future patch. Sorry - I didn't intend to suggest altering the domctl lock. Just that adding in a new monitor lock and using it everywhere where appropriate would be a good step. Breaking the domctl lock is a very large can of worms, which you definitely won't want to open while attempting to do something else. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 20/22] xen/arm: Don't export flush_tlb_domain
On 07/22/2016 12:57 PM, Julien Grall wrote: > > > On 22/07/16 11:46, Sergej Proskurin wrote: >> >> >> On 07/22/2016 12:34 PM, Julien Grall wrote: >>> >>> >>> On 22/07/16 11:25, Sergej Proskurin wrote: Hi Julien, On 07/22/2016 11:30 AM, Julien Grall wrote: > > > On 22/07/16 09:54, Sergej Proskurin wrote: >> Hi Julien, > > Hello Sergej, > >> On 07/20/2016 06:11 PM, Julien Grall wrote: >>> The function flush_tlb_domain is not used outside of the file >>> where it >>> has been declared. >>> >> >> As for patch #15, the same applies here too: >> For altp2m, flush_tlb_domain/p2m_flush_tlb should be made >> available to >> ./xen/arch/arm/altp2m.c. > > Based on your previous version, I don't see any reason to flush call > flush_tlb_domain/p2m_flush_tlb in altp2m. > > Please justify why you would need it. > The new version considers changes that are made to the hostp2m and propagates them to all affected altp2m views by either changing individual altp2m entries or even flushing (but not destroying) the entire altp2m-tables. This idea has been borrowed from the x86 altp2m implementation. To prevent access to old/invalid GPAs, the current implementation flushes the TLBs associated with the affected altp2m view after such propagation. >>> >>> There is already a flush in apply_p2m_changes and removing all the >>> mapping in a p2m could be implemented in p2m.c. So I still don't see why >>> you need the flush outside. >>> >> >> Yes, the flush you are referring to flushes the hostp2m - not the >> individual altp2m views. > > apply_p2m_changes is *not* hostp2m specific. It should work on any p2m > regardless the type. > This true. However, p2m_propagate_change is invoked (from apply_p2m_changes) only if the p2m that was currently modified is the hostp2m. > The ARM P2M interface is not set in stone, so if it does not fit it will > need to be changed. We should avoid to hack the code in order to add a > new feature. > > It might be time to mention that I am reworking the whole p2m code it > does not respect the ARM spec (such as break-before-make semantics) and > I believe it does not fit the altp2m model. It is very difficult to > implement the former with the current implementation and without have a > big performance impact. > > Rather than having a function which implement all the operations, I am > planning to have a simple set of functions that can be used to > re-implement the operations: > - p2m_set_entry: Set an entry in the P2M > - p2m_get_entry: Retrieve the informations of an entry > > This is very similar to x86 and make more straight forward to implement > new operations and co-op with the ARM spec. > > I have already a prototype and I am hoping to send it soon. > >> >>> I looked at the x86 version of the propagation and I was not able to >>> spot any explicit flush. Maybe you can provide some code to show what >>> you mean. >>> >> >> Sure thing: >> >> ... >> >> static void p2m_reset_altp2m(struct p2m_domain *p2m) >> { >> p2m_flush_table(p2m); >> /* Uninit and reinit ept to force TLB shootdown */ >> ept_p2m_uninit(p2m); >> ept_p2m_init(p2m); >> p2m->min_remapped_gfn = INVALID_GFN; >> p2m->max_remapped_gfn = 0; >> } >> >> ... >> >> On x86, the uninit- and re-initialization of the EPTs force the TLBs >> associated with the configured VMID of the EPTs to flush. > > As mentioned in my previous mail, p2m_reset can be implemented in p2m.c > as this is not altp2m.c specific. > Well yes. However, it is not used for the hostp2m, thus it makes it automatically altp2m specific - but I know what you mean. Yet, I believe it is cleaner to separate the entire altp2m code and maintain it in altp2m.c. Nevertheless, I will need to pull back parts of altp2m code into p2m.c, if we will not share some of the initialization/teardown functions between both files. Regards, ~Sergej ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 1:29 PM, Razvan Cojocaru wrote: On 07/22/2016 01:17 PM, Corneliu ZUZU wrote: On 7/22/2016 12:55 PM, Razvan Cojocaru wrote: On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved. I wouldn't think that was the rationale (considering that the risk is crashing the hypervisor as a whole; also see below), I'd rather think this simply went unnoticed. Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5). And even if an introspection application behaves as sanely as you say, the current layout of the code still doesn't guarantee bug-free behavior. That's because for one, there are in-guest events (that trigger access of vm-events resources) that are completely asynchronous in relation to 2-6, for example a HVMOP_guest_request_vm_event (subsequently calling monitor_guest_request()). Not true. As part of step 4 d->monitor.guest_request_enabled will become 0, so no sensitive code will run after that: I wasn't being rigorous with 2-6 above (with that I only meant to say 'there can't be any vm-events before everything is initialized') and it seems true that at least in the monitor_guest_request() case it might not be able to produce a hypervisor crash (although there are a number of ugly inconsistencies on that code path). But nonetheless a hypervisor crash can _still_ happen even with a _sane_ toolstack user. Look @ hvm_do_resume(): 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize everything (no events are possible here because of steps 4-5)." 2. But just before you do that, a hvm_do_resume() happens on an arbitrary vCPU of the domain and gets to this point: if ( unlikely(v->arch.vm_event) ) { // ---> let's say you are now at this point, just after the above non-NULL check struct monitor_write_data *w = &v->arch.vm_event->write_data; 3. and before proceeding further, the uninitialization sequence _completes_, i.e. this was done in vm_event_cleanup_domain(): for_each_vcpu ( d, v ) { xfree(v->arch.vm_event); v->arch.vm_event = NULL; } 4. and then in hvm_do_resume() this gets to be done: struct monitor_write_data *w = &v->arch.vm_event->write_data; [...] if ( w->do_write.msr ) { hvm_msr_write_intercept(w->msr, w->value, 0); w->do_write.msr = 0; } then you'll have a beautiful NULL-pointer access hypervisor crash. 129 void monitor_guest_request(void) 130 { 131 struct vcpu *curr = current; 132 struct domain *d = curr->domain; 133 134 if ( d->monitor.guest_request_enabled ) 135 { 136 vm_event_request_t req = { 137 .reason = VM_EVENT_REASON_GUEST_REQUEST, 138 .vcpu_id = curr->vcpu_id, 139 }; 140 141 monitor_traps(curr, d->monitor.guest_request_sync, &req); 142
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 2:13 PM, Andrew Cooper wrote: On 22/07/16 11:31, Corneliu ZUZU wrote: On 7/22/2016 12:51 PM, Andrew Cooper wrote: On 22/07/16 10:27, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). Correct. Without the vcpu/domain pause, a vcpu could be midway through a vmexit path with the monitor configuration changing under its feet. OTOH, it could be running in guest context, and only receive the update one scheduling timeslice later. As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? As a resolution for this, I've been thinking of adding a 'subsys_lock' field in the vm_event_domain structure, either spinlock or rw-lock, which would be initialised/uninitialised when d.vm_event is allocated/freed (domain_create()/complete_domain_destroy()). I don't think you are missing anything. It seems like a monitor lock is needed. FWIW, the domctl lock is only used at the moment because it was easy, and worked when everything was a domctl. Being a global spinlock, it is a severe bottlekneck for concurrent domain management, which I need to find some time to break apart, so the less reliance on it the better from my point of view. ~Andrew I was thinking of not touching the way the domctl lock is acquired in these cases at all, but rather leave that as it is and introduce the rw-lock separately. Are you suggesting I should also change do_domctl to skip domctl_lock_acquire()-ing if op->cmd is XEN_DOMCTL_vm_event_op ? That would certainly be a first step towards amending that bottleneck you mentioned, but it would break consistency of locking-behavior for domctls. I think it would also add significant complexity to what I'll be doing so I hope it wouldn't be a problem if we leave that for a future patch. Sorry - I didn't intend to suggest altering the domctl lock. Just that adding in a new monitor lock and using it everywhere where appropriate would be a good step. Breaking the domctl lock is a very large can of worms, which you definitely won't want to open while attempting to do something else. ~Andrew Yep, agreed. Corneliu. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On Fri, Jul 22, 2016 at 05:43:32PM +0800, Bob Liu wrote: > > On 07/22/2016 05:34 PM, Roger Pau Monné wrote: > > On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote: > >> > >> On 07/22/2016 03:45 PM, Roger Pau Monné wrote: > >>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: > > On 07/21/2016 04:57 PM, Roger Pau Monné wrote: > >> ..[snip].. > >> + > >> +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, > >> ssize_t count) > >> +{ > >> + unsigned int i; > >> + int err = -EBUSY; > >> + > >> + /* > >> + * Make sure no migration in parallel, device lock is actually a > >> + * mutex. > >> + */ > >> + if (!device_trylock(&info->xbdev->dev)) { > >> + pr_err("Fail to acquire dev:%s lock, may be in > >> migration.\n", > >> + dev_name(&info->xbdev->dev)); > >> + return err; > >> + } > >> + > >> + /* > >> + * Prevent new requests and guarantee no uncompleted reqs. > >> + */ > >> + blk_mq_freeze_queue(info->rq); > >> + if (part_in_flight(&info->gd->part0)) > >> + goto out; > >> + > >> + /* > >> + * FrontBackend > >> + * Switch to XenbusStateClosed > >> + * frontend_changed(): > >> + * case XenbusStateClosed: > >> + * > >> xen_blkif_disconnect() > >> + * Switch to > >> XenbusStateClosed > >> + * blkfront_resume(): > >> + * frontend_changed(): > >> + * reconnect > >> + * Wait until XenbusStateConnected > >> + */ > >> + info->reconfiguring = true; > >> + xenbus_switch_state(info->xbdev, XenbusStateClosed); > >> + > >> + /* Poll every 100ms, 1 minute timeout. */ > >> + for (i = 0; i < 600; i++) { > >> + /* > >> + * Wait backend enter XenbusStateClosed, > >> blkback_changed() > >> + * will clear reconfiguring. > >> + */ > >> + if (!info->reconfiguring) > >> + goto resume; > >> + schedule_timeout_interruptible(msecs_to_jiffies(100)); > >> + } > > > > Instead of having this wait, could you just set info->reconfiguring = > > 1, set > > the frontend state to XenbusStateClosed and mimic exactly what a resume > > from > > suspension does? blkback_changed would have to set the frontend state > > to > > InitWait when it detects that the backend has switched to Closed, and > > call > > blkfront_resume. > > > I think that won't work. > In the real "resume" case, the power management system will trigger all > ->resume() path. > But there is no place for dynamic configuration. > >>> > >>> Hello, > >>> > >>> I think it should be possible to set info->reconfiguring and wait for the > >>> backend to switch to state Closed, at that point we should call > >>> blkif_resume > >>> (from blkback_changed) and the backend will follow the reconection. > >>> > >> > >> Okay, I get your point. Yes, that's an option. > >> > >> But this will make 'dynamic configuration' to be async, I'm worry about > >> the end-user will get panic. > >> E.g > >> A end-user "echo > /sys/devices/vbd-xxx/max_indirect_segs", > >> but then the device will be Closed and disappeared, the user have to wait > >> for a random time so that the device can resume. > > > > That should not happen, AFAICT on migration the device never dissapears. > > Oh, yes. > > > alloc_disk and friends should not be called on resume from migration (see > > the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED > > path for the reconfiguration). > > > > What about if the end-user starts I/O immediately after writing new value to > /sys? > But the resume is still in progress. blkif_free already stops the queues by calling blk_mq_stop_hw_queues, and blkif_queue_request will refuse to put anything on the ring if the state is different than connected, which in turn makes blkif_queue_rq call blk_mq_stop_hw_queue to stop the queue, so it should be safe. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [distros-debian-jessie test] 66673: tolerable FAIL
flight 66673 distros-debian-jessie real [real] http://osstest.xs.citrite.net/~osstest/testlogs/logs/66673/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): build-amd64 5 xen-buildfail blocked in 66571 build-i3865 xen-buildfail blocked in 66571 Tests which did not succeed, but are not blocking: test-amd64-i386-amd64-jessie-netboot-pygrub 1 build-check(1) blocked n/a test-amd64-amd64-i386-jessie-netboot-pygrub 1 build-check(1) blocked n/a test-amd64-amd64-amd64-jessie-netboot-pvgrub 1 build-check(1) blocked n/a test-amd64-i386-i386-jessie-netboot-pvgrub 1 build-check(1) blocked n/a test-armhf-armhf-armhf-jessie-netboot-pygrub 11 migrate-support-check fail never pass test-armhf-armhf-armhf-jessie-netboot-pygrub 12 saverestore-support-check fail never pass baseline version: flight 66571 jobs: build-amd64 fail build-armhf pass build-i386 fail build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-amd64-jessie-netboot-pvgrub blocked test-amd64-i386-i386-jessie-netboot-pvgrub blocked test-amd64-i386-amd64-jessie-netboot-pygrub blocked test-armhf-armhf-armhf-jessie-netboot-pygrub pass test-amd64-amd64-i386-jessie-netboot-pygrub blocked sg-report-flight on osstest.xs.citrite.net logs: /home/osstest/logs images: /home/osstest/images Logs, config files, etc. are available at http://osstest.xs.citrite.net/~osstest/testlogs/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Push not applicable. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen: credit2: don't let b_avgload go negative.
The ASSERT() made effective by b5b5876619bd8ec2e ("xen: credit2: fix two s_time_t handling issues in load balancing") triggers for b_avgload (spotted by OSSTest). b_avgload is where we store the prediction of how the load of a runqueue will look like in the medium to long term, because of a vcpu being added to or removed from there. On vcpu removal, saturate down b_avgload to zero, as it makes very few sense to predict that the load of a runqueue will at some point become negative! Signed-off-by: Dario Faggioli --- Cc: George Dunlap Cc: Wei Liu --- xen/common/sched_credit2.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index b92226c..1d79de0 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1323,14 +1323,16 @@ runq_assign(const struct scheduler *ops, struct vcpu *vc) static void __runq_deassign(struct csched2_vcpu *svc) { +struct csched2_runqueue_data *rqd = svc->rqd; + ASSERT(!__vcpu_on_runq(svc)); ASSERT(!(svc->flags & CSFLAG_scheduled)); list_del_init(&svc->rqd_elem); -update_max_weight(svc->rqd, 0, svc->weight); +update_max_weight(rqd, 0, svc->weight); /* Expected new load based on removing this vcpu */ -svc->rqd->b_avgload -= svc->avgload; +rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0); svc->rqd = NULL; } @@ -1590,7 +1592,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) if ( rqd == svc->rqd ) { if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) -rqd_avgload = rqd->b_avgload - svc->avgload; +rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0); } else if ( spin_trylock(&rqd->lock) ) { ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 07/22/2016 02:39 PM, Corneliu ZUZU wrote: > On 7/22/2016 1:29 PM, Razvan Cojocaru wrote: >> On 07/22/2016 01:17 PM, Corneliu ZUZU wrote: >>> On 7/22/2016 12:55 PM, Razvan Cojocaru wrote: On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: > Hi, > > I've been inspecting vm-event code parts to try and understand when > and > why domain pausing/locking is done. If I understood correctly, domain > pausing is done solely to force all the vCPUs of that domain to see a > configuration update and act upon it (e.g. in the case of a > XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, > domain pausing/unpausing ensures immediate enabling of CR3 > load-exiting > for all vCPUs), not to synchronize concurrent operations > (lock-behavior). > > As for locking, I see that for example vm_event_init_domain(), > vm_event_cleanup_domain() and monitor_domctl() are all protected by > the > domctl-lock, but I don't think that's enough. > Here are a few code-paths that led me to believe that: > * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* > resources, but it doesn't seem to take the domctl lock, so it seems > possible for that to happen _while_ those resources are > initialized/cleaned-up > * monitor_guest_request() also calls > monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() > which attempts a vm_event_ring_lock(ved), which could also be called > _while_ that's initialized (vm_event_enable()) or cleaned-up > (vm_event_disable()) > * hvm_monitor_cr() - e.g. on the code-path > vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() > > > there doesn't seem to be taken into account the possibility of a > concurrent monitor_init_domain()/monitor_cleanup_domain() > > Am I missing something with these conclusions? Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved. >>> I wouldn't think that was the rationale (considering that the risk is >>> crashing the hypervisor as a whole; also see below), I'd rather think >>> this simply went unnoticed. >>> Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5). >>> And even if an introspection application behaves as sanely as you say, >>> the current layout of the code still doesn't guarantee bug-free >>> behavior. That's because for one, there are in-guest events (that >>> trigger access of vm-events resources) that are completely asynchronous >>> in relation to 2-6, for example a HVMOP_guest_request_vm_event >>> (subsequently calling monitor_guest_request()). >> Not true. As part of step 4 d->monitor.guest_request_enabled will become >> 0, so no sensitive code will run after that: > > I wasn't being rigorous with 2-6 above (with that I only meant to say > 'there can't be any vm-events before everything is initialized') and it > seems true that at least in the monitor_guest_request() case it might > not be able to produce a hypervisor crash (although there are a number > of ugly inconsistencies on that code path). But nonetheless a hypervisor > crash can _still_ happen even with a _sane_ toolstack user. > > Look @ hvm_do_resume(): > > 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize > everything (no events are possible here because of steps 4-5)." > 2. But just before you do that, a hvm_do_resume() happens on an > arbitrary vCPU of the domain and gets to this point: > > if ( unlikely(v->arch.vm_event) ) > { > // ---> let's say you are now at this point, just after the > above non-NULL check > struct monitor_write_data *w = &v->arch.vm_event->write_data; > > 3. and before proceeding further, the uninitialization sequence > _completes_, i.e. this was done in vm_event_cleanup_domain(): > > for_each_vcpu ( d, v ) > { > xfree(v->arch.vm_event); > v->arch.vm_event = NULL; > } > > 4. and then in hvm_do_resume() this gets to be done: > > struct monitor_write_data *w = &v->arch.vm_event->write_data; > > [...] > > if ( w->do_write.msr ) > { > hvm_msr_write_intercept(w->msr, w->value, 0); > w->do_write.msr = 0; > } > > then you'll have a bea
Re: [Xen-devel] [xen-unstable test] 97737: regressions - FAIL
On Fri, 2016-07-22 at 11:49 +0100, Wei Liu wrote: > On Fri, Jul 22, 2016 at 03:27:30AM +, osstest service owner > wrote: > > > > flight 97737 xen-unstable real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/97737/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-amd64-amd64-xl-credit2 19 guest-start/debian.repeat fail > > REGR. vs. 97664 > > test-armhf-armhf-xl 15 guest-start/debian.repeat fail > > REGR. vs. 97664 > > test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail > > REGR. vs. 97664 > Thanks from bringing my attention to this. I've seen it happening in local testing a few times also, and was investigating already. > From > http://logs.test-lab.xenproject.org/osstest/logs/97737/test-amd64-amd > 64-xl-credit2/serial-fiano0.log > > Jul 21 07:38:22.383917 (XEN) Assertion 'rqd->avgload >= 0 && rqd- > >b_avgload >= 0' failed at sched_credit2.c:734 > Right. My investigation shows that it is the b_avgload>=0 check that is actually failing, and looking at the code confirmed that, and I found out why. I've just sent a patch that I believe will fix this. I can't find it in the archives yet, but it should be: [PATCH] xen: credit2: don't let b_avgload go negative. msg-id: <146918909364.19443.6394900696027710502.st...@solace.fritz.box> Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 3:10 PM, Razvan Cojocaru wrote: On 07/22/2016 02:39 PM, Corneliu ZUZU wrote: On 7/22/2016 1:29 PM, Razvan Cojocaru wrote: On 07/22/2016 01:17 PM, Corneliu ZUZU wrote: On 7/22/2016 12:55 PM, Razvan Cojocaru wrote: On 07/22/2016 12:27 PM, Corneliu ZUZU wrote: Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions? Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved. I wouldn't think that was the rationale (considering that the risk is crashing the hypervisor as a whole; also see below), I'd rather think this simply went unnoticed. Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5). And even if an introspection application behaves as sanely as you say, the current layout of the code still doesn't guarantee bug-free behavior. That's because for one, there are in-guest events (that trigger access of vm-events resources) that are completely asynchronous in relation to 2-6, for example a HVMOP_guest_request_vm_event (subsequently calling monitor_guest_request()). Not true. As part of step 4 d->monitor.guest_request_enabled will become 0, so no sensitive code will run after that: I wasn't being rigorous with 2-6 above (with that I only meant to say 'there can't be any vm-events before everything is initialized') and it seems true that at least in the monitor_guest_request() case it might not be able to produce a hypervisor crash (although there are a number of ugly inconsistencies on that code path). But nonetheless a hypervisor crash can _still_ happen even with a _sane_ toolstack user. Look @ hvm_do_resume(): 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize everything (no events are possible here because of steps 4-5)." 2. But just before you do that, a hvm_do_resume() happens on an arbitrary vCPU of the domain and gets to this point: if ( unlikely(v->arch.vm_event) ) { // ---> let's say you are now at this point, just after the above non-NULL check struct monitor_write_data *w = &v->arch.vm_event->write_data; 3. and before proceeding further, the uninitialization sequence _completes_, i.e. this was done in vm_event_cleanup_domain(): for_each_vcpu ( d, v ) { xfree(v->arch.vm_event); v->arch.vm_event = NULL; } 4. and then in hvm_do_resume() this gets to be done: struct monitor_write_data *w = &v->arch.vm_event->write_data; [...] if ( w->do_write.msr ) { hvm_msr_write_intercept(w->msr, w->value, 0); w->do_write.msr = 0; } then you'll have a beautiful NULL-pointer access hypervisor crash. As I've said before, of course I agree with that Sorry, I thought by your previous statement "I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved" you were inferring that with a 'sane' toolstack user, as you defined it, it won't be pos
Re: [Xen-devel] [PATCH] xen: credit2: don't let b_avgload go negative.
On 22/07/16 13:04, Dario Faggioli wrote: > The ASSERT() made effective by b5b5876619bd8ec2e > ("xen: credit2: fix two s_time_t handling issues > in load balancing") triggers for b_avgload (spotted > by OSSTest). > > b_avgload is where we store the prediction of how > the load of a runqueue will look like in the medium > to long term, because of a vcpu being added to or > removed from there. > > On vcpu removal, saturate down b_avgload to zero, > as it makes very few sense to predict that the > load of a runqueue will at some point become negative! > > Signed-off-by: Dario Faggioli Acked-by: George Dunlap Wei, do you want to apply it? Thanks, -George > --- > Cc: George Dunlap > Cc: Wei Liu > --- > xen/common/sched_credit2.c |8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index b92226c..1d79de0 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -1323,14 +1323,16 @@ runq_assign(const struct scheduler *ops, struct vcpu > *vc) > static void > __runq_deassign(struct csched2_vcpu *svc) > { > +struct csched2_runqueue_data *rqd = svc->rqd; > + > ASSERT(!__vcpu_on_runq(svc)); > ASSERT(!(svc->flags & CSFLAG_scheduled)); > > list_del_init(&svc->rqd_elem); > -update_max_weight(svc->rqd, 0, svc->weight); > +update_max_weight(rqd, 0, svc->weight); > > /* Expected new load based on removing this vcpu */ > -svc->rqd->b_avgload -= svc->avgload; > +rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0); > > svc->rqd = NULL; > } > @@ -1590,7 +1592,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct > vcpu *vc) > if ( rqd == svc->rqd ) > { > if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) > -rqd_avgload = rqd->b_avgload - svc->avgload; > +rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, > 0); > } > else if ( spin_trylock(&rqd->lock) ) > { > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: credit2: don't let b_avgload go negative.
On Fri, Jul 22, 2016 at 01:34:27PM +0100, George Dunlap wrote: > On 22/07/16 13:04, Dario Faggioli wrote: > > The ASSERT() made effective by b5b5876619bd8ec2e > > ("xen: credit2: fix two s_time_t handling issues > > in load balancing") triggers for b_avgload (spotted > > by OSSTest). > > > > b_avgload is where we store the prediction of how > > the load of a runqueue will look like in the medium > > to long term, because of a vcpu being added to or > > removed from there. > > > > On vcpu removal, saturate down b_avgload to zero, > > as it makes very few sense to predict that the > > load of a runqueue will at some point become negative! > > > > Signed-off-by: Dario Faggioli > > Acked-by: George Dunlap > > Wei, do you want to apply it? > I will apply it with your ack. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 07/22/2016 03:32 PM, Corneliu ZUZU wrote: >>> Look @ hvm_do_resume(): >>> >>> 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize >>> everything (no events are possible here because of steps 4-5)." >>> 2. But just before you do that, a hvm_do_resume() happens on an >>> arbitrary vCPU of the domain and gets to this point: >>> >>> if ( unlikely(v->arch.vm_event) ) >>> { >>> // ---> let's say you are now at this point, just after the >>> above non-NULL check >>> struct monitor_write_data *w = &v->arch.vm_event->write_data; >>> >>> 3. and before proceeding further, the uninitialization sequence >>> _completes_, i.e. this was done in vm_event_cleanup_domain(): >>> >>> for_each_vcpu ( d, v ) >>> { >>> xfree(v->arch.vm_event); >>> v->arch.vm_event = NULL; >>> } >>> >>> 4. and then in hvm_do_resume() this gets to be done: >>> >>> struct monitor_write_data *w = &v->arch.vm_event->write_data; >>> >>> [...] >>> >>> if ( w->do_write.msr ) >>> { >>> hvm_msr_write_intercept(w->msr, w->value, 0); >>> w->do_write.msr = 0; >>> } >>> >>> then you'll have a beautiful NULL-pointer access hypervisor crash. >> As I've said before, of course I agree with that > > Sorry, I thought by your previous statement "I assume that the reason > why this has not been addressed in the past is that introspection > applications are expected to be well-behaved" you were inferring that > with a 'sane' toolstack user, as you defined it, it won't be possible to > have a hypervisor crash. I did (please see below). >> - isn't that what a >> previous patch you've submitted addressed? > > I'm not sure what patch you're referring to, I don't remember ever > addressing these issues before.. I am talking about your patch "x86/monitor: fix: don't compromise a monitor_write_data with pending CR writes", which, if I'm not mistaken, addresses the above problem (since we won't xfree(v->arch.vm_event) on monitor uninit anymore). With the above problem fixed, the workflow I suggested should be fine. Again, I would prefer things to be rock solid, so personally I encourage your patch and hope we get it in. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/22] xen/arm: Don't call p2m_alloc_table from arch_domain_create
On 22/07/16 12:05, Sergej Proskurin wrote: On 07/22/2016 12:38 PM, Julien Grall wrote: On 22/07/16 11:39, Sergej Proskurin wrote: On 07/22/2016 12:26 PM, Julien Grall wrote: On 22/07/16 11:16, Sergej Proskurin wrote: Hi Julien, Hello, On 07/22/2016 11:18 AM, Julien Grall wrote: On 22/07/16 09:32, Sergej Proskurin wrote: Hi Julien, Hello Sergej, -int p2m_alloc_table(struct domain *d) +static int p2m_alloc_table(struct domain *d) While moving parts of the altp2m code out of ./xen/arch/arm/p2m.c, the function p2m_alloc_table needs to be called from ./xen/arch/arm/altp2m.c to allocate the individual altp2m views. Hence it should not be static. No, this function should not be called outside p2m.c as it will not fully initialize the p2m. You need to need to provide a function to initialize a p2m (such as p2m_init). The last time we have discussed reusing existing code, among others, for individual struct p2m_domain initialization routines. Also, we have agreed to move altp2m-related parts out of p2m.c into altp2m.c, which makes it hard not to access parts required for initialization/teardown (that are equal for both p2m and altp2m). I remember this discussion. However, the p2m initialize/teardown should exactly be the same for the hostp2m and altp2m (except for the type of the p2m). So, a function should be provided to initialize a full p2m to avoid code duplication. This is exactly what has been done. Nevertheless, altp2m views are somewhat more dynamic than hostp2m and hence require frequent initialization/teardown of individual views. That is, by moving altp2m parts out of p2m.c we simply need to access this shared code multiple times at runtime from different altp2m-related functions. This applies to more functions that just to p2m_alloc_table. I am not convinced that you need to reallocate the root page every time rather than clearing them. Anyway, I will need to see the code to understand what is done. Regards, In the following, you will find an excerpt of both the p2m.c and altp2m.c concerning the (alt)p2m initialization. Please note that altp2m_init_helper is called from different initialization routines in altp2m.c. Also, please consider that this code has not yet been ported to your recent patches. Please let me know if you need more information. static int altp2m_init_helper(struct domain *d, unsigned int idx) { int rc; struct p2m_domain *p2m = d->arch.altp2m_p2m[idx]; if ( p2m == NULL ) { /* Allocate a new altp2m view. */ p2m = xzalloc(struct p2m_domain); if ( p2m == NULL) { rc = -ENOMEM; goto err; } memset(p2m, 0, sizeof(struct p2m_domain)); } /* Initialize the new altp2m view. */ rc = p2m_init_one(d, p2m); if ( rc ) goto err; /* Allocate a root table for the altp2m view. */ rc = p2m_alloc_table(p2m); This patch moved p2m_alloc_table call into p2m_init (i.e your p2m_init_one). You complained that the function was not exported anymore, but you did not look how it was called in this patch. if ( rc ) goto err; p2m->p2m_class = p2m_alternate; p2m->access_required = 1; _atomic_set(&p2m->active_vcpus, 0); d->arch.altp2m_p2m[idx] = p2m; d->arch.altp2m_vttbr[idx] = p2m->vttbr.vttbr; Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline baseline-only test] 66672: regressions - FAIL
This run is configured for baseline tests only. flight 66672 qemu-mainline real [real] http://osstest.xs.citrite.net/~osstest/testlogs/logs/66672/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 5 xen-build fail REGR. vs. 66623 build-i386-xsm5 xen-build fail REGR. vs. 66623 build-amd64-xsm 5 xen-build fail REGR. vs. 66623 build-i3865 xen-build fail REGR. vs. 66623 test-armhf-armhf-libvirt-qcow2 9 debian-di-install fail REGR. vs. 66623 Tests which did not succeed, but are not blocking: build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-intel 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-amd 1 build-check(1) blocked n/a test-amd64-amd64-xl-multivcpu 1 build-check(1) blocked n/a test-amd64-amd64-xl 1 build-check(1) blocked n/a test-amd64-amd64-xl-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-amd 1 build-check(1) blocked n/a test-amd64-i386-freebsd10-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-rtds 1 build-check(1) blocked n/a test-amd64-amd64-qemuu-nested-intel 1 build-check(1) blocked n/a test-amd64-amd64-i386-pvgrub 1 build-check(1) blocked n/a test-amd64-i386-xl-xsm1 build-check(1) blocked n/a test-amd64-i386-freebsd10-i386 1 build-check(1) blocked n/a test-amd64-amd64-amd64-pvgrub 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-pygrub 1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-intel 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-raw1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-xl-credit2 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-winxpsp3 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64 1 build-check(1)blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qcow2 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-amd64-qemuu-nested-amd 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-winxpsp3 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 guest-saverestorefail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-midway 13 saverestore-support-checkfail never pass test-armhf-armhf-xl
Re: [Xen-devel] [PATCH 01/19] xen: Create a new file xen_pvdev.c
sorry for the bad format from web email, and later review (neo training in new company).. patch 6 -- patch 12, rename * patch, are good to me. Quan --From:Emil Condrea Time:2016 Jul 19 (Tue) 00:54To:Eric Blake Cc:qemu-devel ; Daniel De Graaf ; xen-devel ; Stefano Stabellini ; Quan Xu ; wei.liu2 ; stefanb ; anthony.perard Subject:Re: [Xen-devel] [PATCH 01/19] xen: Create a new file xen_pvdev.c Eric, this is the link to the original patch which is well formatted: http://marc.info/?l=xen-devel&m=146815138831762&w=2I think that the formatting and s-o-b was broken in the reply from Quan.On Jul 18, 2016 17:57, "Eric Blake" wrote: On 07/17/2016 01:41 AM, Quan Xu wrote: > > [Quan:]: comment starts with [Quan:] > This line doesn't belong in a commit message; it's fine to put it after the --- separator though, if it aids mailing list reviewers. > > The purpose of the new file is to store generic functions shared by > frontendand backends such as xenstore operations, xendevs. > s/frontendand/front end and/ Please wrap your commit message lines. Since 'git log' displays logs with indentation, wrapping around 72 characters is ideal. > Signed-off-by: Quan Xu > Signed-off-by: Emil Condrea These are not valid S-o-b, therefore this patch cannot be applied as-is. > -int xenstore_read_int(const char *base, const char *node, int *ival) > -{ > - char *val; > - int rc = -1; > - > - val = xenstore_read_str(base, node); > [Quan:]: IMO, it is better to initialize val when declares. the same > comment for the other 'val' > - if (val && 1 == sscanf(val, "%d", ival)) { This is not a valid patch. Are you replying to a patch that someone else posted? If so, your quoting style is VERY difficult to read. Please consider using a leading > before every line that you are quoting (rather than pasting it verbatim as if you had written it), and include a blank line both before and after every line that you insert, to call visual attention to what is your reply vs. what you are quoting. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 2/7] xen/arm: cpufeature: Provide an helper to check if a capability is supported
On Wed, Jul 20, 2016 at 04:25:55PM +0100, Julien Grall wrote: > The CPU capabilities will be set depending on the value found in the CPU > registers. This patch provides a generic to go through a set of capabilities > and find which one should be enabled. > > The parameter "info" is used to display the kind of capability updated (e.g > workaround, feature...). > > Signed-off-by: Julien Grall > Acked-by: Stefano Stabellini Reviewed-by: Konrad Rzeszutek Wilk with just one minor suggestion: > > --- > Changes in v4: > - Add Stefano's acked-by > > Changes in v3: > - Patch added. The code was previously part of "Detect > silicon...". > --- > xen/arch/arm/cpufeature.c| 16 > xen/include/asm-arm/cpufeature.h | 9 + > 2 files changed, 25 insertions(+) > > diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c > index 7a1b56b..088625b 100644 > --- a/xen/arch/arm/cpufeature.c > +++ b/xen/arch/arm/cpufeature.c > @@ -24,6 +24,22 @@ > > DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS); > > +void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, > + const char *info) > +{ > +int i; > + > +for ( i = 0; caps[i].matches; i++ ) > +{ > +if ( !caps[i].matches(&caps[i]) ) So what if the 'struct arm_cpu_capabilitues' has '->matches' set to NULL? Perhaps: if ( !caps[i].matches || !caps[i]... ? > +continue; > + > +if ( !cpus_have_cap(caps[i].capability) && caps[i].desc ) > +printk(XENLOG_INFO "%s: %s\n", info, caps[i].desc); > +cpus_set_cap(caps[i].capability); > +} > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/include/asm-arm/cpufeature.h > b/xen/include/asm-arm/cpufeature.h > index 2bebad1..be2414c 100644 > --- a/xen/include/asm-arm/cpufeature.h > +++ b/xen/include/asm-arm/cpufeature.h > @@ -62,6 +62,15 @@ static inline void cpus_set_cap(unsigned int num) > __set_bit(num, cpu_hwcaps); > } > > +struct arm_cpu_capabilities { > +const char *desc; > +u16 capability; > +bool_t (*matches)(const struct arm_cpu_capabilities *); > +}; > + > +void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, > + const char *info); > + > #endif /* __ASSEMBLY__ */ > > #endif > -- > 1.9.1 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] xen/arm: Introduce alternative runtime patching
> diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c > new file mode 100644 > index 000..d00f98e > --- /dev/null > +++ b/xen/arch/arm/alternative.c Hey! I've some comments, most of them are light. What I am concerned most is the difference between comments in the header vs the code. Looking at Linux code it seems to have the same issue - and I know you don't want to differ to much from Linux. Perhaps go with the changes as is (With the comments being off) and the then have some patches for Linux for the header mismatches and those can be backported? ..snip.. > +static int __apply_alternatives(const struct alt_region *region) > +{ > +const struct alt_instr *alt; > +const u32 *origptr, *replptr; > +u32 *writeptr, *writemap; > +mfn_t text_mfn = _mfn(virt_to_mfn(_stext)); > +unsigned int text_order = get_order_from_bytes(_end - _start); > + > +printk("Patching kernel code\n"); I see you use prefixed later with 'alternatives:' and also this is without any prefix (DEBUG, or ERR). Would it make sense to have this one above be dprintk while the one below with printk (KERN_ERR) ? > + > +/* > + * The text section is read-only. So re-map Xen to be able to patch > + * the code. > + */ > +writemap = __vmap(&text_mfn, 1 << text_order, 1, 1, PAGE_HYPERVISOR, > + VMAP_DEFAULT); > +if ( !writemap ) > +{ > +printk("alternatives: Unable to map the text section\n"); Perhaps include the size of the text section? That may help in figuring out what went wrong or if one could increase the VMAP area? > +return -ENOMEM; > +} > + > +for ( alt = region->begin; alt < region->end; alt++ ) > +{ > +u32 insn; > +int i, nr_inst; > + > +if ( !cpus_have_cap(alt->cpufeature) ) > +continue; > + > +BUG_ON(alt->alt_len != alt->orig_len); The header file says: + u8 alt_len;/* size of new instruction(s), <= orig_len */ So this BUG_ON seems incorrect? But hten later in the header it says: "1. Be exactly the same length (in bytes) as the default code + *sequence." So the BUG_ON is correct. Perhaps the 'alt_len' comment should be s/<=/==/ ? > + > +origptr = ALT_ORIG_PTR(alt); > +writeptr = origptr - (u32 *)_start + writemap; How about just using writeptr += ? Ah wait. You are trying to preserve the Linux code!. Nevermind then. > +replptr = ALT_REPL_PTR(alt); > + > +nr_inst = alt->alt_len / sizeof(insn); > + > +for ( i = 0; i < nr_inst; i++ ) > +{ > +insn = get_alt_insn(alt, origptr + i, replptr + i); > +*(writeptr + i) = cpu_to_le32(insn); > +} > + > +/* Ensure the new instructions reached the memory and nuke */ > +clean_and_invalidate_dcache_va_range(writeptr, > + (sizeof (*writeptr) * nr_inst)); > +} > + > +/* Nuke the instruction cache */ > +invalidate_icache(); > + > +vunmap(writemap); > + > +return 0; > +} > + > +/* > + * We might be patching the stop_machine state machine, so implement a > + * really simple polling protocol here. > + */ > +static int __apply_alternatives_multi_stop(void *unused) > +{ > +static int patched = 0; Shouldn't this be 'atomic_t' ? > +struct alt_region region = { Can this 'const' ? > +.begin = __alt_instructions, > +.end = __alt_instructions_end, > +}; > + > +/* We always have a CPU 0 at this point (__init) */ > +if ( smp_processor_id() ) > +{ > +while ( !read_atomic(&patched) ) > +cpu_relax(); > +isb(); > +} > +else > +{ > +int ret; > + > +BUG_ON(patched); > +ret = __apply_alternatives(®ion); > +/* The patching is not expected to fail during boot. */ > +BUG_ON(ret != 0); > + > +/* Barriers provided by the cache flushing */ > +write_atomic(&patched, 1); > +} > + > +return 0; > +} > + > +/* > + * This function should only be called during boot and before CPU0 jump > + * into the idle_loop. So could you add an: > + */ > +void __init apply_alternatives_all(void) > +{ > +int ret; > + ASSERT(system_state != SYS_STATE_active); ? > + /* better not try code patching on a live SMP system */ > +ret = stop_machine_run(__apply_alternatives_multi_stop, NULL, NR_CPUS); > + > +/* stop_machine_run should never fail at this stage of the boot */ > +BUG_ON(ret); > +} > + > +int apply_alternatives(void *start, size_t length) > +{ > +struct alt_region region = { Could this be 'const'? > +.begin = start, > +.end = start + length, > +}; > + > +return __apply_alternatives(®ion); > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > index
Re: [Xen-devel] [PATCH v5 3/7] xen/arm: Detect silicon revision and set cap bits accordingly
On Wed, Jul 20, 2016 at 04:25:56PM +0100, Julien Grall wrote: > After each CPU has been started, we iterate through a list of CPU > errata to detect CPUs which need from hypervisor code patches. > > For each bug there is a function which check if that a particular CPU is s/check/checks/ > affected. This needs to be done on every CPUs to cover heterogenous s/CPUs/CPU/ > system properly. s/system/systems/ ? (not sure) > > If a certain erratum has been detected, the capability bit will be set. > In the case the erratum requires code patching, this will be triggered > by the call to apply_alternatives. s/the/// > > The code is based on the file arch/arm64/kernel/cpu_errata.c in Linux > v4.6-rc3. > > Signed-off-by: Julien Grall > Acked-by: Stefano Stabellini > > --- > Changes in v4: > - Add missing emacs magic blocks > - Add Stefano's acked-by > > Changes in v3: > - Move update_cpu_capabilities in a separate patch > - Update the commit message to mention that workaround may > not require code patching. > > Changes in v2: > - Use XENLOG_INFO for the loglevel of the message > --- > xen/arch/arm/Makefile| 1 + > xen/arch/arm/cpuerrata.c | 34 ++ > xen/arch/arm/setup.c | 3 +++ > xen/arch/arm/smpboot.c | 3 +++ > xen/include/asm-arm/cpuerrata.h | 14 ++ > xen/include/asm-arm/cpufeature.h | 6 ++ > 6 files changed, 61 insertions(+) > create mode 100644 xen/arch/arm/cpuerrata.c > create mode 100644 xen/include/asm-arm/cpuerrata.h > > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile > index 74bd7b8..23aaf52 100644 > --- a/xen/arch/arm/Makefile > +++ b/xen/arch/arm/Makefile > @@ -7,6 +7,7 @@ subdir-$(CONFIG_ACPI) += acpi > obj-$(CONFIG_ALTERNATIVE) += alternative.o > obj-y += bootfdt.o > obj-y += cpu.o > +obj-y += cpuerrata.o > obj-y += cpufeature.o > obj-y += decode.o > obj-y += device.o > diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c > new file mode 100644 > index 000..03ae7b4 > --- /dev/null > +++ b/xen/arch/arm/cpuerrata.c > @@ -0,0 +1,34 @@ > +#include > +#include > +#include > + > +#define MIDR_RANGE(model, min, max) \ > +.matches = is_affected_midr_range, \ > +.midr_model = model,\ > +.midr_range_min = min, \ > +.midr_range_max = max > + > +static bool_t __maybe_unused > +is_affected_midr_range(const struct arm_cpu_capabilities *entry) > +{ > +return MIDR_IS_CPU_MODEL_RANGE(boot_cpu_data.midr.bits, > entry->midr_model, > + entry->midr_range_min, > + entry->midr_range_max); > +} > + > +static const struct arm_cpu_capabilities arm_errata[] = { > +{}, > +}; > + > +void check_local_cpu_errata(void) > +{ > +update_cpu_capabilities(arm_errata, "enabled workaround for"); > +} > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > index 97b3214..38eb888 100644 > --- a/xen/arch/arm/setup.c > +++ b/xen/arch/arm/setup.c > @@ -43,6 +43,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -171,6 +172,8 @@ static void __init processor_id(void) > } > > processor_setup(); > + > +check_local_cpu_errata(); > } > > void dt_unreserved_regions(paddr_t s, paddr_t e, > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 3a962f7..d56b91d 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -316,6 +317,8 @@ void start_secondary(unsigned long boot_phys_offset, > local_irq_enable(); > local_abort_enable(); > > +check_local_cpu_errata(); > + > printk(XENLOG_DEBUG "CPU %u booted.\n", smp_processor_id()); > > startup_cpu_idle_loop(); > diff --git a/xen/include/asm-arm/cpuerrata.h b/xen/include/asm-arm/cpuerrata.h > new file mode 100644 > index 000..fe93beb > --- /dev/null > +++ b/xen/include/asm-arm/cpuerrata.h > @@ -0,0 +1,14 @@ > +#ifndef __ARM_CPUERRATA_H Sorry about having engaged the pedantic review mode, but this caught my eye. I thought the style was to prefix it with __ and also postfix it with __: $ grep "__" *.h decode.h:#ifndef __ARCH_ARM_DECODE_H_ decode.h:#define __ARCH_ARM_DECODE_H_ decode.h:#endif /* __ARCH_ARM_DECODE_H_ */ kernel.h:#ifndef __ARCH_ARM_KERNEL_H__ kernel.h:#define __ARCH_ARM_KERNEL_H__ kernel.h:#endif /* #ifdef __ARCH_ARM_KERNEL_H__ */ vtimer.h:#ifndef __ARCH_ARM_VTIMER_H__ vtimer.h:#define __ARCH_ARM_VTIMER_H__ vuart.h:#ifndef __ARCH_ARM_VUART_H__ vuart.h:#define __ARCH_ARM_VUART_H__ vuart.h:#endif /* __ARCH_ARM_VUART_H__ */ And the include/asm also has have this in in surplus. It rea
Re: [Xen-devel] [PATCH 01/19] xen: Create a new file xen_pvdev.c
Anthony, thanks for your explaination.IMO, patch 1 and patch 2 need your detailed review.. IMO the reset patches are good in general..Emil, if patch 1 / patch 2 are reviewed from anthony, could you send out v10? :) i know it's not an easy task, thanks in advence!! Quan On Mon, 18 Jul 2016 15:50:27 +0100, anthony.perard wrote:> On Sun, Jul 17, 2016 at 03:41:26PM +0800, Quan Xu wrote: > -int xenstore_write_int(const char *base, const char *node, int ival) > -{ > -char val[12]; > - > [Quan:]: why 12 ? what about XEN_BUFSIZE? That is the number of digit in INT_MAX (10) + 1 for the sign + 1 for '\0'. > -snprintf(val, sizeof(val), "%d", ival); > -return xenstore_write_str(base, node, val); > -} > - > -int xenstore_write_int64(const char *base, const char *node, int64_t ival) > -{ > -char val[21]; > - > [Quan:]: why 21 ? what about XEN_BUFSIZE? Same with INT64_MAX (19 digits). > > -snprintf(val, sizeof(val), "%"PRId64, ival); > -return xenstore_write_str(base, node, val); > -} > - > -int xenstore_read_int(const char *base, const char *node, int *ival) > -{ > -char *val; > -int rc = -1; > - > -val = xenstore_read_str(base, node); > [Quan:]: IMO, it is better to initialize val when declares. I think I prefer it this way. > -if (val && 1 == sscanf(val, "%d", ival)) { > -rc = 0; > -} > -g_free(val); > -return rc; > -} -- Anthony PERARD___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 4/7] xen/arm: Document the errata implemented in Xen
On Wed, Jul 20, 2016 at 04:25:57PM +0100, Julien Grall wrote: > The new document will help to keep track of each erratum Xen is able to > handle. > > The text is based on the Linux doc in Documents/arm64/silicon-errata.txt. > > Also list the current errata that Xen is aware of. > > Signed-off-by: Julien Grall > Acked-by: Stefano Stabellini Reviewed-by: Konrad Rzeszutek Wilk > > --- > Changes in v4: > - Fix grammar in the document : s/by ARM64/on ARM64/ > - Add Stefano's acked-by > > Changes in v3: > - Fix grammar in the commit message > - s/Linux/Xen/ > - Mention that runtime patching is only supported by ARM64 > --- > docs/misc/arm/silicon-errata.txt | 45 > > 1 file changed, 45 insertions(+) > create mode 100644 docs/misc/arm/silicon-errata.txt > > diff --git a/docs/misc/arm/silicon-errata.txt > b/docs/misc/arm/silicon-errata.txt > new file mode 100644 > index 000..5d54694 > --- /dev/null > +++ b/docs/misc/arm/silicon-errata.txt > @@ -0,0 +1,45 @@ > +Silicon Errata and Software Workarounds > +=== > + > +It is an unfortunate fact of life that hardware is often produced with > +so-called "errata", which can cause it to deviate from the architecture > +under specific circumstances. For hardware produced by ARM, these > +errata are broadly classified into the following categories: > + > + Category A: A critical error without a viable workaround. > + Category B: A significant or critical error with an acceptable > + workaround. > + Category C: A minor error that is not expected to occur under normal > + operation. > + > +For more information, consult one of the "Software Developers Errata > +Notice" documents available on infocenter.arm.com (registration > +required). > + > +As far as Xen is concerned, Category B errata may require some special > +treatment in the hypervisor. For example, avoiding a particular sequence > +of code, or configuring the processor in a particular way. A less common > +situation may require similar actions in order to declassify a Category A > +erratum into a Category C erratum. These are collectively known as > +"software workarounds" and are only required in the minority of cases > +(e.g. those cases that both require a non-secure workaround *and* can > +be triggered by Xen). > + > +For software workarounds that may adversely impact systems unaffected by > +the erratum in question, a Kconfig entry is added under "ARM errata > +workarounds via the alternatives framework". These are enabled by default > +and patched in at runtime when an affected CPU is detected. Note that > +runtime patching is only supported on ARM64. For less-intrusive workarounds, > +a Kconfig option is not available and the code is structured (preferably > +with a comment) in such a way that the erratum will not be hit. > + > +This approach can make it slightly onerous to determine exactly which > +errata are worked around in an arbitrary hypervisor source tree, so this > +file acts as a registry of software workarounds in the Xen hypervisor and > +will be updated when new workarounds are committed and backported to > +stable hypervisors. > + > +| Implementor| Component | Erratum ID | Kconfig > | > +++-+-+-+ > +| ARM| Cortex-A15 | #766422 | N/A > | > +| ARM| Cortex-A57 | #852523 | N/A > | > -- > 1.9.1 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 01/19] xen: Create a new file xen_pvdev.c
Sure, I will continue to send revisions until it is approved upstream. On Jul 22, 2016 5:24 PM, "Quan Xu" wrote: > Anthony, thanks for your explaination. > IMO, patch 1 and patch 2 need your detailed review.. IMO the reset > patches are good in general.. > Emil, if patch 1 / patch 2 are reviewed from anthony, could you send out > v10? :) i know it's not an easy task, thanks in advence!! > > Quan > > > On Mon, 18 Jul 2016 15:50:27 +0100, anthony.perard< > anthony.per...@citrix.com> wrote: > > On Sun, Jul 17, 2016 at 03:41:26PM +0800, Quan Xu wrote: > > -int xenstore_write_int(const char *base, const char *node, int ival) > > -{ > > -char val[12]; > > - > > [Quan:]: why 12 ? what about XEN_BUFSIZE? > > That is the number of digit in INT_MAX (10) + 1 for the sign + 1 for '\0'. > > > -snprintf(val, sizeof(val), "%d", ival); > > -return xenstore_write_str(base, node, val); > > -} > > - > > > -int xenstore_write_int64(const char *base, const char *node, int64_t ival) > > -{ > > -char val[21]; > > - > > [Quan:]: why 21 ? what about XEN_BUFSIZE? > > Same with INT64_MAX (19 digits). > > > > > -snprintf(val, sizeof(val), "%"PRId64, ival); > > -return xenstore_write_str(base, node, val); > > -} > > - > > -int xenstore_read_int(const char *base, const char *node, int *ival) > > -{ > > -char *val; > > -int rc = -1; > > - > > -val = xenstore_read_str(base, node); > > [Quan:]: IMO, it is better to initialize val when declares. > > I think I prefer it this way. > > > -if (val && 1 == sscanf(val, "%d", ival)) { > > -rc = 0; > > -} > > -g_free(val); > > -return rc; > > -} > > -- > Anthony PERARD > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 0/7] xen/arm: Introduce alternative runtime patching for ARM64
On Wed, Jul 20, 2016 at 04:25:53PM +0100, Julien Grall wrote: > Hello, > > Some of the processor errata will require to modify code sequence. As those > modifications may impact the performance, they should only be enabled on > affected cores. Furthermore, Xen may also want to take advantage of > new hardware features coming up with v8.1 and v8.2. > > The first part of the series adds the alternative infrastructure, most of > the code is based on Linux v4.6-rc3. The rest of the series implements > errata for Cortex-A57 and Cortex-A53. > > A branch with all the patches can be found here: > > git://xenbits.xen.org/people/julieng/xen-unstable.git branch alternative-v5 > > For all the changes, see in each patch. Hey! I took a look at the patches. Only had some brief comments. I didn't review the last three patches as I am not that familiar with the instructions. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/9] xen/arm: Provide macros to help creating workaround helpers
On Wed, Jul 20, 2016 at 8:43 AM, Julien Grall wrote: > Hi Konrad, > >>> For instance, the line bellow will create a workaround helper for >>> erratum #424242 which is enabled when the capability >>> ARM64_WORKAROUND_424242 is set and only available for ARM64: 42, eh? >>> >>> CHECK_WORKAROUND_HELPER(424242, ARM64_WORKAROUND_42424242, CONFIG_ARM64) >>> >>> Signed-off-by: Julien Grall >> >> >> It looks good to me. CC'ing Konrad which is more knowledgeable than me >> about ALTERNATIVE. > > > Do you have any opinions on this patch? Yes I do: Reviewed-by: Konrad Rzeszutek Wilk ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] "X86_PV_VCPU_MSRS record truncated" during domain restore
The patches (that are attached to "[PATCH RFC 0/4] Fix issues with zero-length records in migration v2") work! I patched locally qubes-builder to import your patches and recreate rpm files. These patches also work on Xen 4.6.1. Best regards, Massimo On 07/21/2016 10:53 AM, Andrew Cooper wrote: On 21/07/2016 09:39, Massimo Colombi wrote: This is the output of verify-stream-v2: [user@dom0 scripts]$ sudo xl save fedora-23-dvm /fedora-23-dvm-savefile Saving to /fedora-23-dvm-savefile new xl format (info 0x3/0x0/1991) xc: info: Saving domain 4, type x86 PV xc: Frames: 912384/912384 100% xc: End of stream: 0/00% [user@dom0 scripts]$ ./verify-stream-v2 -f xl -i /fedora-23-dvm-savefile Stream Error: Traceback (most recent call last): File "./verify-stream-v2", line 82, in read_stream VerifyLibxl(info, stream_read).verify() File "/usr/lib64/python2.7/site-packages/xen/migration/libxl.py", line 82, in verify while self.verify_record() != REC_TYPE_end: File "/usr/lib64/python2.7/site-packages/xen/migration/libxl.py", line 136, in verify_record record_verifiers[rtype](self, content[:length]) File "/usr/lib64/python2.7/site-packages/xen/migration/libxl.py", line 155, in verify_record_libxc_context VerifyLibxc(self.info, self.read).verify() File "/usr/lib64/python2.7/site-packages/xen/migration/libxc.py", line 132, in verify while self.verify_record() != REC_TYPE_end: File "/usr/lib64/python2.7/site-packages/xen/migration/libxc.py", line 227, in verify_record record_verifiers[rtype](self, content[:length]) File "/usr/lib64/python2.7/site-packages/xen/migration/libxc.py", line 429, in VerifyLibxc.verify_record_x86_pv_vcpu_generic(s, x, "msrs"), File "/usr/lib64/python2.7/site-packages/xen/migration/libxc.py", line 323, in verify_record_x86_pv_vcpu_generic " bytes long" % (name, minsz)) RecordError: X86_PV_VCPU_msrs record length must be at least 8 bytes long On 07/21/2016 02:10 AM, Andrew Cooper wrote: Is it possible to do an `xl save` equivalent on the domain, and run tools/python/scripts/verify-stream-v2 against the resulting file? That should identify whether it is a malformed X86_PV_VCPU_MSRS record but otherwise intact stream, or whether the stream becomes corrupted elsewhere? Thanks for your explanation of the bug. I should also improve the reported error message. Do you mind rerunning with an extra -v to dump a list of the records found in the stream? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 2/4] tools: split out xenstored starting form xencommons
In order to prepare starting a xenstore domain split out the starting of the xenstore daemon from the xencommons script into a dedicated launch-xenstore script. Correct one error: don't remove old tdb files in background, as this could lead to very subtle races. A rerun of autogen.sh is required. Signed-off-by: Juergen Gross --- .gitignore | 1 + tools/configure | 3 +- tools/configure.ac | 1 + tools/hotplug/Linux/Makefile | 1 + tools/hotplug/Linux/init.d/xencommons.in | 36 ++--- tools/hotplug/Linux/launch-xenstore.in | 55 6 files changed, 63 insertions(+), 34 deletions(-) create mode 100644 tools/hotplug/Linux/launch-xenstore.in diff --git a/.gitignore b/.gitignore index 9b8dece..d193820 100644 --- a/.gitignore +++ b/.gitignore @@ -157,6 +157,7 @@ tools/hotplug/Linux/init.d/xen-watchdog tools/hotplug/Linux/init.d/xencommons tools/hotplug/Linux/init.d/xendomains tools/hotplug/Linux/init.d/xendriverdomain +tools/hotplug/Linux/launch-xenstore tools/hotplug/Linux/systemd/*.conf tools/hotplug/Linux/systemd/*.mount tools/hotplug/Linux/systemd/*.socket diff --git a/tools/configure b/tools/configure index 1c84c6c..51f16c5 100755 --- a/tools/configure +++ b/tools/configure @@ -2410,7 +2410,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu -ac_config_files="$ac_config_files ../config/Tools.mk hotplug/FreeBSD/rc.d/xencommons hotplug/FreeBSD/rc.d/xendriverdomain hotplug/Linux/init.d/sysconfig.xencommons hotplug/Linux/init.d/sysconfig.xendomains hotplug/Linux/init.d/xen-watchdog hotplug/Linux/init.d/xencommons hotplug/Linux/init.d/xendomains hotplug/Linux/init.d/xendriverdomain hotplug/Linux/vif-setup hotplug/Linux/xen-hotplug-common.sh hotplug/Linux/xendomains hotplug/NetBSD/rc.d/xencommons hotplug/NetBSD/rc.d/xendriverdomain libxl/xenlight.pc.in libxl/xlutil.pc.in ocaml/xenstored/oxenstored.conf" +ac_config_files="$ac_config_files ../config/Tools.mk hotplug/FreeBSD/rc.d/xencommons hotplug/FreeBSD/rc.d/xendriverdomain hotplug/Linux/init.d/sysconfig.xencommons hotplug/Linux/init.d/sysconfig.xendomains hotplug/Linux/init.d/xen-watchdog hotplug/Linux/init.d/xencommons hotplug/Linux/init.d/xendomains hotplug/Linux/init.d/xendriverdomain hotplug/Linux/launch-xenstore hotplug/Linux/vif-setup hotplug/Linux/xen-hotplug-common.sh hotplug/Linux/xendomains hotplug/NetBSD/rc.d/xencommons hotplug/NetBSD/rc.d/xendriverdomain libxl/xenlight.pc.in libxl/xlutil.pc.in ocaml/xenstored/oxenstored.conf" ac_config_headers="$ac_config_headers config.h" @@ -10376,6 +10376,7 @@ do "hotplug/Linux/init.d/xencommons") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/init.d/xencommons" ;; "hotplug/Linux/init.d/xendomains") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/init.d/xendomains" ;; "hotplug/Linux/init.d/xendriverdomain") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/init.d/xendriverdomain" ;; +"hotplug/Linux/launch-xenstore") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/launch-xenstore" ;; "hotplug/Linux/vif-setup") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/vif-setup" ;; "hotplug/Linux/xen-hotplug-common.sh") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/xen-hotplug-common.sh" ;; "hotplug/Linux/xendomains") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/xendomains" ;; diff --git a/tools/configure.ac b/tools/configure.ac index f991ab3..3a4abb5 100644 --- a/tools/configure.ac +++ b/tools/configure.ac @@ -15,6 +15,7 @@ hotplug/Linux/init.d/xen-watchdog hotplug/Linux/init.d/xencommons hotplug/Linux/init.d/xendomains hotplug/Linux/init.d/xendriverdomain +hotplug/Linux/launch-xenstore hotplug/Linux/vif-setup hotplug/Linux/xen-hotplug-common.sh hotplug/Linux/xendomains diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile index 6d6ccee..29280cb 100644 --- a/tools/hotplug/Linux/Makefile +++ b/tools/hotplug/Linux/Makefile @@ -30,6 +30,7 @@ XEN_SCRIPTS += block-drbd-probe XEN_SCRIPTS += block-dummy XEN_SCRIPTS += $(XEN_SCRIPTS-y) XEN_SCRIPTS += colo-proxy-setup +XEN_SCRIPTS += launch-xenstore SUBDIRS-$(CONFIG_SYSTEMD) += systemd diff --git a/tools/hotplug/Linux/init.d/xencommons.in b/tools/hotplug/Linux/init.d/xencommons.in index 2d8f30b..a32608c 100644 --- a/tools/hotplug/Linux/init.d/xencommons.in +++ b/tools/hotplug/Linux/init.d/xencommons.in @@ -18,7 +18,6 @@ # Description: Starts and stops the daemons neeeded for xl/xend ### END INIT INFO -XENSTORED=@XENSTORED@ BACKEND_MODULES="@LINUX_BACKEND_MODULES@" . @XEN_SCRIPT_DIR@/hotplugpath.sh @@ -53,8 +52,6 @@ if test -f /proc/xen/capabilities && \ fi do_start () { -local time=0 - local timeout=30 local mod for mod in $BACKEND_MODULES ; do modprobe "$mod" &>/dev/null ; done @@ -62,37 +59,10 @@ do_start () { mkdir -p ${XEN_RUN_DIR} mkdir -p ${XEN_LOCK_DIR} - if ! `${bindir}/xenstore-read -s / >/dev/nul
[Xen-devel] [PATCH v3 0/4] tools: make xenstore domain/daemon configurable
Add a configuration option to /etc/sysconfig/xencommons to let the user configure whether he wants to start xenstore service as a daemon or as a stubdom. Changes in V3: - patch 1: re-add sd_notify() call - split up former patch 2 into 3 patches as requested by Ian Jackson - patch 4 (was 2): remove check for running xenstore domain, as this is done in init-xenstore-domain already - patch 4 (was 2): if booted with systemd send a systemd-notify message in the xenstore domain case - patch 4 (was 2): if booted with systemd don't wait until xenstore daemon is running, as the daemon will have sent a notify message by its own Changes in V2: - move service type modification form patch 2 to patch 1 as implied by Ross Lagerwall (at least I guess so) - add .gitignore entry for launch-xenstore Juergen Gross (4): tools: remove systemd xenstore socket definitions tools: split out xenstored starting form xencommons tools: use pidfile for test if xenstored is running tools: make xenstore domain easy configurable .gitignore | 1 + tools/configure| 7 +- tools/configure.ac | 3 +- tools/hotplug/Linux/Makefile | 1 + tools/hotplug/Linux/init.d/sysconfig.xencommons.in | 42 +++- tools/hotplug/Linux/init.d/xencommons.in | 38 +-- tools/hotplug/Linux/launch-xenstore.in | 87 tools/hotplug/Linux/systemd/Makefile | 5 - tools/hotplug/Linux/systemd/xenstored.service.in | 13 +-- tools/hotplug/Linux/systemd/xenstored.socket.in| 13 --- tools/hotplug/Linux/systemd/xenstored_ro.socket.in | 13 --- tools/ocaml/xenstored/systemd.ml | 1 - tools/ocaml/xenstored/systemd.mli | 5 - tools/ocaml/xenstored/systemd_stubs.c | 113 - tools/ocaml/xenstored/utils.ml | 21 ++-- tools/xenstore/xenstored_core.c| 92 + 16 files changed, 169 insertions(+), 286 deletions(-) create mode 100644 tools/hotplug/Linux/launch-xenstore.in delete mode 100644 tools/hotplug/Linux/systemd/xenstored.socket.in delete mode 100644 tools/hotplug/Linux/systemd/xenstored_ro.socket.in -- 2.6.6 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 3/4] tools: use pidfile for test if xenstored is running
Instead of trying to read xenstore via xenstore-read use the pidfile of xenstored for the test whether xenstored is running. This prepares support of xenstore domain, as trying to read xenstore will block for ever in case xenstore domain is started after trying to read. Signed-off-by: Juergen Gross --- tools/hotplug/Linux/init.d/xencommons.in | 2 +- tools/hotplug/Linux/launch-xenstore.in | 58 +++- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/tools/hotplug/Linux/init.d/xencommons.in b/tools/hotplug/Linux/init.d/xencommons.in index a32608c..a6a40d6 100644 --- a/tools/hotplug/Linux/init.d/xencommons.in +++ b/tools/hotplug/Linux/init.d/xencommons.in @@ -96,7 +96,7 @@ case "$1" in do_start ;; status) -${bindir}/xenstore-read -s / +test -f @XEN_RUN_DIR@/xenstored.pid ;; stop) do_stop diff --git a/tools/hotplug/Linux/launch-xenstore.in b/tools/hotplug/Linux/launch-xenstore.in index a0cbfd3..2bd9f64 100644 --- a/tools/hotplug/Linux/launch-xenstore.in +++ b/tools/hotplug/Linux/launch-xenstore.in @@ -18,38 +18,48 @@ XENSTORED=@XENSTORED@ . @XEN_SCRIPT_DIR@/hotplugpath.sh -test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons -time=0 -timeout=30 +test_xenstore () { + test -f /var/run/xenstored.pid + return $? +} -if ! `${bindir}/xenstore-read -s / >/dev/null 2>&1` -then - test -z "$XENSTORED_ROOTDIR" && XENSTORED_ROOTDIR="@XEN_LIB_STORED@" - rm -f "$XENSTORED_ROOTDIR"/tdb* 2>/dev/null - test -z "$XENSTORED_TRACE" || XENSTORED_ARGS=" -T @XEN_LOG_DIR@/xenstored-trace.log" +timeout_xenstore () { + local time=0 + local timeout=30 - if [ -n "$XENSTORED" ] ; then - echo -n Starting $XENSTORED... - $XENSTORED --pid-file @XEN_RUN_DIR@/xenstored.pid $XENSTORED_ARGS - else - echo "No xenstored found" - exit 1 - fi - - # Wait for xenstored to actually come up, timing out after 30 seconds - while [ $time -lt $timeout ] && ! `${bindir}/xenstore-read -s / >/dev/null 2>&1` ; do - echo -n . - time=$(($time+1)) - sleep 1 + while [ $time -lt $timeout ] && ! test_xenstore ; do + echo -n . + time=$(($time+1)) + sleep 1 done echo - + # Exit if we timed out if ! [ $time -lt $timeout ] ; then - echo Could not start xenstored - exit 1 + echo "Could not start $@" + return 1 fi + + return 0 +} + +test_xenstore && exit 0 + +test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons + +test -z "$XENSTORED_ROOTDIR" && XENSTORED_ROOTDIR="@XEN_LIB_STORED@" +rm -f "$XENSTORED_ROOTDIR"/tdb* 2>/dev/null +test -z "$XENSTORED_TRACE" || XENSTORED_ARGS=" -T @XEN_LOG_DIR@/xenstored-trace.log" + +if [ -n "$XENSTORED" ] ; then + echo -n Starting $XENSTORED... + $XENSTORED --pid-file @XEN_RUN_DIR@/xenstored.pid $XENSTORED_ARGS +else + echo "No xenstored found" + exit 1 fi +timeout_xenstore $XENSTORED || exit 1 + exit 0 -- 2.6.6 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 1/4] tools: remove systemd xenstore socket definitions
On a system with systemd the xenstore sockets are created via systemd. Remove the related configuration files in order to be able to decide at runtime whether the sockets should be created or not. This will enable Xen to start xenstore either via a daemon or via a stub domain. As the xenstore domain start program will exit after it has done its job prepare the same behaviour to be tolerated by systemd for the xenstore daemon by specifying the appropriate flags in the service file. A rerun of autogen.sh is required. Signed-off-by: Juergen Gross --- V3: re-add sd_notify() call --- tools/configure| 4 +- tools/configure.ac | 2 - tools/hotplug/Linux/systemd/Makefile | 5 - tools/hotplug/Linux/systemd/xenstored.service.in | 5 +- tools/hotplug/Linux/systemd/xenstored.socket.in| 13 --- tools/hotplug/Linux/systemd/xenstored_ro.socket.in | 13 --- tools/ocaml/xenstored/systemd.ml | 1 - tools/ocaml/xenstored/systemd.mli | 5 - tools/ocaml/xenstored/systemd_stubs.c | 113 - tools/ocaml/xenstored/utils.ml | 21 ++-- tools/xenstore/xenstored_core.c| 92 + 11 files changed, 33 insertions(+), 241 deletions(-) delete mode 100644 tools/hotplug/Linux/systemd/xenstored.socket.in delete mode 100644 tools/hotplug/Linux/systemd/xenstored_ro.socket.in diff --git a/tools/configure b/tools/configure index 5b5dcce..1c84c6c 100755 --- a/tools/configure +++ b/tools/configure @@ -9670,7 +9670,7 @@ fi if test "x$systemd" = "xy"; then : -ac_config_files="$ac_config_files hotplug/Linux/systemd/proc-xen.mount hotplug/Linux/systemd/var-lib-xenstored.mount hotplug/Linux/systemd/xen-init-dom0.service hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service hotplug/Linux/systemd/xen-watchdog.service hotplug/Linux/systemd/xenconsoled.service hotplug/Linux/systemd/xendomains.service hotplug/Linux/systemd/xendriverdomain.service hotplug/Linux/systemd/xenstored.service hotplug/Linux/systemd/xenstored.socket hotplug/Linux/systemd/xenstored_ro.socket" +ac_config_files="$ac_config_files hotplug/Linux/systemd/proc-xen.mount hotplug/Linux/systemd/var-lib-xenstored.mount hotplug/Linux/systemd/xen-init-dom0.service hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service hotplug/Linux/systemd/xen-watchdog.service hotplug/Linux/systemd/xenconsoled.service hotplug/Linux/systemd/xendomains.service hotplug/Linux/systemd/xendriverdomain.service hotplug/Linux/systemd/xenstored.service" fi @@ -10394,8 +10394,6 @@ do "hotplug/Linux/systemd/xendomains.service") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/systemd/xendomains.service" ;; "hotplug/Linux/systemd/xendriverdomain.service") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/systemd/xendriverdomain.service" ;; "hotplug/Linux/systemd/xenstored.service") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/systemd/xenstored.service" ;; -"hotplug/Linux/systemd/xenstored.socket") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/systemd/xenstored.socket" ;; -"hotplug/Linux/systemd/xenstored_ro.socket") CONFIG_FILES="$CONFIG_FILES hotplug/Linux/systemd/xenstored_ro.socket" ;; *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;; esac diff --git a/tools/configure.ac b/tools/configure.ac index 87e14a8..f991ab3 100644 --- a/tools/configure.ac +++ b/tools/configure.ac @@ -438,8 +438,6 @@ AS_IF([test "x$systemd" = "xy"], [ hotplug/Linux/systemd/xendomains.service hotplug/Linux/systemd/xendriverdomain.service hotplug/Linux/systemd/xenstored.service -hotplug/Linux/systemd/xenstored.socket -hotplug/Linux/systemd/xenstored_ro.socket ]) ]) diff --git a/tools/hotplug/Linux/systemd/Makefile b/tools/hotplug/Linux/systemd/Makefile index 558e459..7d24bbe 100644 --- a/tools/hotplug/Linux/systemd/Makefile +++ b/tools/hotplug/Linux/systemd/Makefile @@ -6,9 +6,6 @@ XEN_SYSTEMD_MODULES = xen.conf XEN_SYSTEMD_MOUNT = proc-xen.mount XEN_SYSTEMD_MOUNT += var-lib-xenstored.mount -XEN_SYSTEMD_SOCKET = xenstored.socket -XEN_SYSTEMD_SOCKET += xenstored_ro.socket - XEN_SYSTEMD_SERVICE = xenstored.service XEN_SYSTEMD_SERVICE += xenconsoled.service XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service @@ -19,7 +16,6 @@ XEN_SYSTEMD_SERVICE += xendriverdomain.service ALL_XEN_SYSTEMD = $(XEN_SYSTEMD_MODULES) \ $(XEN_SYSTEMD_MOUNT)\ - $(XEN_SYSTEMD_SOCKET) \ $(XEN_SYSTEMD_SERVICE) .PHONY: all @@ -38,7 +34,6 @@ install: $(ALL_XEN_SYSTEMD) $(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_DIR) [ -d $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD) ] || \ $(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD) - $(INSTALL_DATA) *.socket $(DESTDIR)$(XEN_SYSTEMD_DIR) $(INSTALL_DATA) *.service $
[Xen-devel] [PATCH v3 4/4] tools: make xenstore domain easy configurable
Add configuration entries to sysconfig.xencommons for selection of the xenstore type (domain or daemon) and start the selected xenstore service via a script called from sysvinit or systemd. Signed-off-by: Juergen Gross --- V3: - remove check for running xenstore domain, as this is done in init-xenstore-domain already - if booted with systemd send a systemd-notify message in the xenstore domain case - if booted with systemd don't wait until xenstore daemon is running, as the daemon will have sent a notify message by its own - split up patch as requested by Ian Jackson V2: - add .gitignore entry for launch-xenstore --- tools/hotplug/Linux/init.d/sysconfig.xencommons.in | 42 -- tools/hotplug/Linux/launch-xenstore.in | 42 -- tools/hotplug/Linux/systemd/xenstored.service.in | 8 + 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/tools/hotplug/Linux/init.d/sysconfig.xencommons.in b/tools/hotplug/Linux/init.d/sysconfig.xencommons.in index c27a476..cc8185c 100644 --- a/tools/hotplug/Linux/init.d/sysconfig.xencommons.in +++ b/tools/hotplug/Linux/init.d/sysconfig.xencommons.in @@ -6,12 +6,24 @@ #XENCONSOLED_TRACE=[none|guest|hv|all] ## Type: string +## Default: daemon +# +# Select type of xentore service. +# +# This can be either of: +# * daemon +# * domain +# +# Changing this requires a reboot to take effect. +# +#XENSTORETYPE=daemon + +## Type: string ## Default: xenstored # # Select xenstore implementation, this can be either -# of these below. If using systemd it's preferred that you -# just edit the xenstored.service unit file and change -# the XENSTORED variable there. +# of these below. +# Only evaluated if XENSTORETYPE is "daemon". # # This can be either of: # * @sbindir@/oxenstored @@ -26,21 +38,45 @@ # Additional commandline arguments to start xenstored, # like "--trace-file @XEN_LOG_DIR@/xenstored-trace.log" # See "@sbindir@/xenstored --help" for possible options. +# Only evaluated if XENSTORETYPE is "daemon". XENSTORED_ARGS= ## Type: string ## Default: Not defined, tracing off # # Log xenstored messages +# Only evaluated if XENSTORETYPE is "daemon". #XENSTORED_TRACE=[yes|on|1] ## Type: string ## Default: "@XEN_LIB_STORED@" # # Running xenstored on XENSTORED_ROOTDIR +# Only evaluated if XENSTORETYPE is "daemon". #XENSTORED_ROOTDIR=@XEN_LIB_STORED@ ## Type: string +## Default: @LIBEXEC@/boot/xenstore-stubdom.gz +# +# xenstore domain kernel. +# Only evaluated if XENSTORETYPE is "domain". +#XENSTORE_DOMAIN_KERNEL=@LIBEXEC@/boot/xenstore-stubdom.gz + +## Type: integer +## Default: 8 +# +# xenstore domain memory size in MiB. +# Only evaluated if XENSTORETYPE is "domain". +#XENSTORE_DOMAIN_SIZE=8 + +## Type: string +## Default: "" +# +# Additional arguments for starting the xenstore domain. +# Only evaluated if XENSTORETYPE is "domain". +XENSTORE_DOMAIN_ARGS= + +## Type: string ## Default: Not defined, xenbackendd debug mode off # # Running xenbackendd in debug mode diff --git a/tools/hotplug/Linux/launch-xenstore.in b/tools/hotplug/Linux/launch-xenstore.in index 2bd9f64..fdfa33a 100644 --- a/tools/hotplug/Linux/launch-xenstore.in +++ b/tools/hotplug/Linux/launch-xenstore.in @@ -48,18 +48,40 @@ test_xenstore && exit 0 test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons -test -z "$XENSTORED_ROOTDIR" && XENSTORED_ROOTDIR="@XEN_LIB_STORED@" -rm -f "$XENSTORED_ROOTDIR"/tdb* 2>/dev/null -test -z "$XENSTORED_TRACE" || XENSTORED_ARGS=" -T @XEN_LOG_DIR@/xenstored-trace.log" +[ "$XENSTORETYPE" = "" ] && XENSTORETYPE=daemon + +/bin/mkdir -p @XEN_RUN_DIR@ + +[ "$XENSTORETYPE" = "daemon" ] && { + [ -z "$XENSTORED_ROOTDIR" ] && XENSTORED_ROOTDIR="@XEN_LIB_STORED@" + /bin/rm -f $XENSTORED_ROOTDIR/tdb* 2>/dev/null + [ -z "$XENSTORED_TRACE" ] || XENSTORED_ARGS="$XENSTORED_ARGS -T @XEN_LOG_DIR@/xenstored-trace.log" + [ -z "$XENSTORED" ] && XENSTORED=@XENSTORED@ + [ -x "$XENSTORED" ] || { + echo "No xenstored found" + exit 1 + } -if [ -n "$XENSTORED" ] ; then echo -n Starting $XENSTORED... $XENSTORED --pid-file @XEN_RUN_DIR@/xenstored.pid $XENSTORED_ARGS -else - echo "No xenstored found" - exit 1 -fi -timeout_xenstore $XENSTORED || exit 1 + systemd-notify --booted 2>/dev/null || timeout_xenstore $XENSTORED || exit 1 -exit 0 + exit 0 +} + +[ "$XENSTORETYPE" = "domain" ] && { + [ -z "$XENSTORE_DOMAIN_KERNEL" ] && XENSTORE_DOMAIN_KERNEL=@LIBEXEC@/boot/xenstore-stubdom.gz + XENSTORE_DOMAIN_ARGS="$XENSTORE_DOMAIN_ARGS --kernel $XENSTORE_DOMAIN_KERNEL" + [ -z "$XENSTORE_DOMAIN_SIZE" ] && XENSTORE_DOMAIN_SIZE=8 + XENSTORE_DOMAIN_ARGS="$XENSTORE_DOMAIN_ARGS --memory $XENSTORE_DOMAIN_SIZE" + + echo -n Starting $XENSTORE_DOMAIN_KERNEL... + ${LIBEXEC_BIN}/init-xenstore-domain $XENSTORE_D
Re: [Xen-devel] [PATCH v5 1/7] xen/arm: Introduce alternative runtime patching
On 22/07/16 15:15, Konrad Rzeszutek Wilk wrote: diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c new file mode 100644 index 000..d00f98e --- /dev/null +++ b/xen/arch/arm/alternative.c Hey! Hi Konrad, I've some comments, most of them are light. What I am concerned most is the difference between comments in the header vs the code. Looking at Linux code it seems to have the same issue - and I know you don't want to differ to much from Linux. Well, I had to differ from Linux because the code was ARM64 specific (i.e renaming function and moving code around to potentially allow ARM32 support). So I don't mind to fix them here and... Perhaps go with the changes as is (With the comments being off) and the then have some patches for Linux for the header mismatches and those can be backported? we can think about Linux changes and resync the code later. ..snip.. +static int __apply_alternatives(const struct alt_region *region) +{ +const struct alt_instr *alt; +const u32 *origptr, *replptr; +u32 *writeptr, *writemap; +mfn_t text_mfn = _mfn(virt_to_mfn(_stext)); +unsigned int text_order = get_order_from_bytes(_end - _start); + +printk("Patching kernel code\n"); I see you use prefixed later with 'alternatives:' and also this is without any prefix (DEBUG, or ERR). Would it make sense to have this one above be dprintk while the one below with printk (KERN_ERR) ? I think this one is useful in non-debug build to know that we are patching the code. So if it crash, we know where we are. + +/* + * The text section is read-only. So re-map Xen to be able to patch + * the code. + */ +writemap = __vmap(&text_mfn, 1 << text_order, 1, 1, PAGE_HYPERVISOR, + VMAP_DEFAULT); +if ( !writemap ) +{ +printk("alternatives: Unable to map the text section\n"); Perhaps include the size of the text section? That may help in figuring out what went wrong or if one could increase the VMAP area? Good point. I will do it in the next version. +return -ENOMEM; +} + +for ( alt = region->begin; alt < region->end; alt++ ) +{ +u32 insn; +int i, nr_inst; + +if ( !cpus_have_cap(alt->cpufeature) ) +continue; + +BUG_ON(alt->alt_len != alt->orig_len); The header file says: + u8 alt_len;/* size of new instruction(s), <= orig_len */ So this BUG_ON seems incorrect? But hten later in the header it says: "1. Be exactly the same length (in bytes) as the default code + *sequence." So the BUG_ON is correct. Perhaps the 'alt_len' comment should be s/<=/==/ ? Good point. I suspect the plan was to support sequence with different length but it has been abandoned later on (CC Andre to confirm). + +origptr = ALT_ORIG_PTR(alt); +writeptr = origptr - (u32 *)_start + writemap; How about just using writeptr += ? I am not sure about your suggestion here. Regardless the Linux code, the origptr will not follow a pattern at each iteration. So we have to recompute it everytime. Ah wait. You are trying to preserve the Linux code!. Nevermind then. +replptr = ALT_REPL_PTR(alt); + +nr_inst = alt->alt_len / sizeof(insn); + +for ( i = 0; i < nr_inst; i++ ) +{ +insn = get_alt_insn(alt, origptr + i, replptr + i); +*(writeptr + i) = cpu_to_le32(insn); +} + +/* Ensure the new instructions reached the memory and nuke */ +clean_and_invalidate_dcache_va_range(writeptr, + (sizeof (*writeptr) * nr_inst)); +} + +/* Nuke the instruction cache */ +invalidate_icache(); + +vunmap(writemap); + +return 0; +} + +/* + * We might be patching the stop_machine state machine, so implement a + * really simple polling protocol here. + */ +static int __apply_alternatives_multi_stop(void *unused) +{ +static int patched = 0; Shouldn't this be 'atomic_t' ? Does it matter? From my understanding the code will behave the same. +struct alt_region region = { Can this 'const' ? It could, I just followed Linux but as we differ I will move to const. +.begin = __alt_instructions, +.end = __alt_instructions_end, +}; + +/* We always have a CPU 0 at this point (__init) */ +if ( smp_processor_id() ) +{ +while ( !read_atomic(&patched) ) +cpu_relax(); +isb(); +} +else +{ +int ret; + +BUG_ON(patched); +ret = __apply_alternatives(®ion); +/* The patching is not expected to fail during boot. */ +BUG_ON(ret != 0); + +/* Barriers provided by the cache flushing */ +write_atomic(&patched, 1); +} + +return 0; +} + +/* + * This function should only be called during boot and before CPU0 jump + * into the idle_loop. So could you add an: + */ +void __init apply_alternatives_all(
Re: [Xen-devel] [PATCH v5 2/7] xen/arm: cpufeature: Provide an helper to check if a capability is supported
Hi Konrad, On 22/07/16 15:18, Konrad Rzeszutek Wilk wrote: On Wed, Jul 20, 2016 at 04:25:55PM +0100, Julien Grall wrote: diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c index 7a1b56b..088625b 100644 --- a/xen/arch/arm/cpufeature.c +++ b/xen/arch/arm/cpufeature.c @@ -24,6 +24,22 @@ DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS); +void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, + const char *info) +{ +int i; + +for ( i = 0; caps[i].matches; i++ ) +{ +if ( !caps[i].matches(&caps[i]) ) So what if the 'struct arm_cpu_capabilitues' has '->matches' set to NULL? It is the exit condition of the loop: for ( i = 0; caps[i].matches; i++ ) Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 2/7] xen/arm: cpufeature: Provide an helper to check if a capability is supported
On Fri, Jul 22, 2016 at 04:31:13PM +0100, Julien Grall wrote: > Hi Konrad, > > On 22/07/16 15:18, Konrad Rzeszutek Wilk wrote: > >On Wed, Jul 20, 2016 at 04:25:55PM +0100, Julien Grall wrote: > >>diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c > >>index 7a1b56b..088625b 100644 > >>--- a/xen/arch/arm/cpufeature.c > >>+++ b/xen/arch/arm/cpufeature.c > >>@@ -24,6 +24,22 @@ > >> > >> DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS); > >> > >>+void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, > >>+ const char *info) > >>+{ > >>+int i; > >>+ > >>+for ( i = 0; caps[i].matches; i++ ) > >>+{ > >>+if ( !caps[i].matches(&caps[i]) ) > > > >So what if the 'struct arm_cpu_capabilitues' has '->matches' set to > >NULL? > > It is the exit condition of the loop: > > for ( i = 0; caps[i].matches; i++ ) Duh! Ignore my query then please. > > Regards, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] xen/arm: Introduce alternative runtime patching
> >>+ > >>+origptr = ALT_ORIG_PTR(alt); > >>+writeptr = origptr - (u32 *)_start + writemap; > > > >How about just using writeptr += ? > > I am not sure about your suggestion here. Regardless the Linux code, the > origptr will not follow a pattern at each iteration. So we have to recompute > it everytime. Just ignore that. I somehow equated writeptr to writemp. ..snip.. > > > > >Ah wait. You are trying to preserve the Linux code!. Nevermind then. > > > >>+replptr = ALT_REPL_PTR(alt); > >>+ > >>+nr_inst = alt->alt_len / sizeof(insn); > >>+ > >>+for ( i = 0; i < nr_inst; i++ ) > >>+{ > >>+insn = get_alt_insn(alt, origptr + i, replptr + i); > >>+*(writeptr + i) = cpu_to_le32(insn); > >>+} > >>+ > >>+/* Ensure the new instructions reached the memory and nuke */ > >>+clean_and_invalidate_dcache_va_range(writeptr, > >>+ (sizeof (*writeptr) * > >>nr_inst)); > >>+} > >>+ > >>+/* Nuke the instruction cache */ > >>+invalidate_icache(); > >>+ > >>+vunmap(writemap); > >>+ > >>+return 0; > >>+} > >>+ > >>+/* > >>+ * We might be patching the stop_machine state machine, so implement a > >>+ * really simple polling protocol here. > >>+ */ > >>+static int __apply_alternatives_multi_stop(void *unused) > >>+{ > >>+static int patched = 0; > > > >Shouldn't this be 'atomic_t' ? > > Does it matter? From my understanding the code will behave the same. Not at all under the hood. But I see 'atomic_write' and they all operate on the 'atomic_t' .. hence the query. ..snip. > >This being a new file perhaps add: > >* > > * Local variables: > > * mode: C > > * c-file-style: "BSD" > > * c-basic-offset: 4 > > * indent-tabs-mode: nil > > * End: > > */ > >? > > It is a Linux file with Linux coding style. I would need to look what should > be the emacs magic block here. Right. I just meant that you needed the magic block. > > Regards, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] xen/arm: Introduce alternative runtime patching
On 22/07/16 16:38, Konrad Rzeszutek Wilk wrote: + +origptr = ALT_ORIG_PTR(alt); +writeptr = origptr - (u32 *)_start + writemap; How about just using writeptr += ? I am not sure about your suggestion here. Regardless the Linux code, the origptr will not follow a pattern at each iteration. So we have to recompute it everytime. Just ignore that. I somehow equated writeptr to writemp. ..snip.. Ah wait. You are trying to preserve the Linux code!. Nevermind then. +replptr = ALT_REPL_PTR(alt); + +nr_inst = alt->alt_len / sizeof(insn); + +for ( i = 0; i < nr_inst; i++ ) +{ +insn = get_alt_insn(alt, origptr + i, replptr + i); +*(writeptr + i) = cpu_to_le32(insn); +} + +/* Ensure the new instructions reached the memory and nuke */ +clean_and_invalidate_dcache_va_range(writeptr, + (sizeof (*writeptr) * nr_inst)); +} + +/* Nuke the instruction cache */ +invalidate_icache(); + +vunmap(writemap); + +return 0; +} + +/* + * We might be patching the stop_machine state machine, so implement a + * really simple polling protocol here. + */ +static int __apply_alternatives_multi_stop(void *unused) +{ +static int patched = 0; Shouldn't this be 'atomic_t' ? Does it matter? From my understanding the code will behave the same. Not at all under the hood. But I see 'atomic_write' and they all operate on the 'atomic_t' .. hence the query. The code is using *_atomic (and not atomic_*) which operate on any type. I know the naming is really confusing. I would like to rename write_atomic and read_atomic to WRITE_ONCE and READ_ONCE. ..snip. This being a new file perhaps add: * * Local variables: * mode: C * c-file-style: "BSD" * c-basic-offset: 4 * indent-tabs-mode: nil * End: */ ? It is a Linux file with Linux coding style. I would need to look what should be the emacs magic block here. Right. I just meant that you needed the magic block. I don't think we ever put magic block on Linux file. I will try to find out and update the patch. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 4:00 PM, Razvan Cojocaru wrote: On 07/22/2016 03:32 PM, Corneliu ZUZU wrote: Look @ hvm_do_resume(): 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize everything (no events are possible here because of steps 4-5)." 2. But just before you do that, a hvm_do_resume() happens on an arbitrary vCPU of the domain and gets to this point: if ( unlikely(v->arch.vm_event) ) { // ---> let's say you are now at this point, just after the above non-NULL check struct monitor_write_data *w = &v->arch.vm_event->write_data; 3. and before proceeding further, the uninitialization sequence _completes_, i.e. this was done in vm_event_cleanup_domain(): for_each_vcpu ( d, v ) { xfree(v->arch.vm_event); v->arch.vm_event = NULL; } 4. and then in hvm_do_resume() this gets to be done: struct monitor_write_data *w = &v->arch.vm_event->write_data; [...] if ( w->do_write.msr ) { hvm_msr_write_intercept(w->msr, w->value, 0); w->do_write.msr = 0; } then you'll have a beautiful NULL-pointer access hypervisor crash. As I've said before, of course I agree with that Sorry, I thought by your previous statement "I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved" you were inferring that with a 'sane' toolstack user, as you defined it, it won't be possible to have a hypervisor crash. I did (please see below). Then, again, you'd be wrong, the hvm_do_resume() was just one example, in a similar fashion this can happen in a number of other places. Details will be given in the patch-series. - isn't that what a previous patch you've submitted addressed? I'm not sure what patch you're referring to, I don't remember ever addressing these issues before.. I am talking about your patch "x86/monitor: fix: don't compromise a monitor_write_data with pending CR writes", which, if I'm not mistaken, addresses the above problem (since we won't xfree(v->arch.vm_event) on monitor uninit anymore). Heh, I wouldn't call fixing problem A and by -coincidence- with that fix also having -totally unrelated- problem B go away "addressing problem B". I didn't even mention these locking issues in that patch, so how could I have addressed them there? With the above problem fixed, the workflow I suggested should be fine. The fact of the matter remains that the current state of staging (which is what this thread applies to) still has the hvm_do_resume() issue as well as the others. Again, I would prefer things to be rock solid, so personally I encourage your patch and hope we get it in. Thanks, Razvan Corneliu. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Infrastructure Maintenance Window: July 25, 2016 from 8:00 - 10:00 UTC
Hi everyone, just a quick note that we will perform the following maintenance activities on Monady July 25, from 8:00 - 10:00 UTC. The following work will be performed: - OS Upgrade of bugs.xenproject.org to Debian Jessie (this will take 1-2 hours) Best Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/4] tools: remove systemd xenstore socket definitions
Only skim-read this patch, will do proper review later. On Fri, Jul 22, 2016 at 05:09:28PM +0200, Juergen Gross wrote: [...] > CAMLprim value ocaml_launched_by_systemd(value ignore) > { > - CAMLparam1(ignore); > - CAMLlocal1(ret); > + CAMLparam1(ignore); > + CAMLlocal1(ret); > > - ret = Val_false; > + ret = Val_false; > > - if (sd_listen_fds(0) > 0) > - ret = Val_true; > + if (sd_booted() > 0) > + ret = Val_true; I think this may be problematic. sd_booted returns true if system is booted with systemd, but it has no bearing whether this particular process is launched by systemd. IIRC using sd_booted would cause oxenstored thinks it is launched by systemd even if the user launches it by hand in a shell. That caused it's initialisation to fail. 81d758afca7c3c1e3ccbd78154b33d64fd7757fb was written to address that issue. So, what would happen if you start oxenstored by hand with your patch apply? Maybe we can just remove this launched_by_systemd check all together -- i.e. we always call sd_notify? > > - CAMLreturn(ret); > + CAMLreturn(ret); > } > > CAMLprim value ocaml_sd_notify_ready(value ignore) > { > - CAMLparam1(ignore); > - CAMLlocal1(ret); > + CAMLparam1(ignore); > + CAMLlocal1(ret); > > - ret = Val_int(0); > + ret = Val_int(0); > > - sd_notify(1, "READY=1"); > + sd_notify(1, "READY=1"); > > - CAMLreturn(ret); > + CAMLreturn(ret); It seems that you have introduced quite a few white space changes. If you really want to change tabs to spaces, please do that in a separate patch. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] tools/libxc: Properly increment ApicIdCoreSize field on AMD
Current code incorrectly adds 1 to full register instead of incrementing the field in bits 15:12. Signed-off-by: Boris Ostrovsky --- tools/libxc/xc_cpuid_x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index 84f4e08..fbbac9e 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -331,7 +331,8 @@ static void amd_xc_cpuid_policy(xc_interface *xch, * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus one). * Update to reflect vLAPIC_ID = vCPU_ID * 2. */ -regs[2] = ((regs[2] & 0xf000u) + 1) | ((regs[2] & 0xffu) << 1) | 1u; +regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | + ((regs[2] & 0xffu) << 1) | 1u; break; case 0x800a: { -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxc: Properly increment ApicIdCoreSize field on AMD
On Fri, Jul 22, 2016 at 01:14:01PM -0400, Boris Ostrovsky wrote: > Current code incorrectly adds 1 to full register instead of > incrementing the field in bits 15:12. > > Signed-off-by: Boris Ostrovsky Acked-by: Wei Liu I trust your expertise in this field. :-) > --- > tools/libxc/xc_cpuid_x86.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c > index 84f4e08..fbbac9e 100644 > --- a/tools/libxc/xc_cpuid_x86.c > +++ b/tools/libxc/xc_cpuid_x86.c > @@ -331,7 +331,8 @@ static void amd_xc_cpuid_policy(xc_interface *xch, > * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus > one). > * Update to reflect vLAPIC_ID = vCPU_ID * 2. > */ > -regs[2] = ((regs[2] & 0xf000u) + 1) | ((regs[2] & 0xffu) << 1) | 1u; > +regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | > + ((regs[2] & 0xffu) << 1) | 1u; > break; > > case 0x800a: { > -- > 1.8.3.1 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxc: Properly increment ApicIdCoreSize field on AMD
On 07/22/2016 01:38 PM, Wei Liu wrote: > On Fri, Jul 22, 2016 at 01:14:01PM -0400, Boris Ostrovsky wrote: >> Current code incorrectly adds 1 to full register instead of >> incrementing the field in bits 15:12. >> >> Signed-off-by: Boris Ostrovsky > Acked-by: Wei Liu > > I trust your expertise in this field. :-) Just in this field? The field is only 4 bits! ;-) But this actually fixes a regression that was triggered by recent hvmloader change on one particular processor that we have in the test farm. -boris > >> --- >> tools/libxc/xc_cpuid_x86.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >> index 84f4e08..fbbac9e 100644 >> --- a/tools/libxc/xc_cpuid_x86.c >> +++ b/tools/libxc/xc_cpuid_x86.c >> @@ -331,7 +331,8 @@ static void amd_xc_cpuid_policy(xc_interface *xch, >> * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus >> one). >> * Update to reflect vLAPIC_ID = vCPU_ID * 2. >> */ >> -regs[2] = ((regs[2] & 0xf000u) + 1) | ((regs[2] & 0xffu) << 1) | 1u; >> +regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | >> + ((regs[2] & 0xffu) << 1) | 1u; >> break; >> >> case 0x800a: { >> -- >> 1.8.3.1 >> ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxc: Properly increment ApicIdCoreSize field on AMD
On Fri, Jul 22, 2016 at 01:45:05PM -0400, Boris Ostrovsky wrote: > On 07/22/2016 01:38 PM, Wei Liu wrote: > > On Fri, Jul 22, 2016 at 01:14:01PM -0400, Boris Ostrovsky wrote: > >> Current code incorrectly adds 1 to full register instead of > >> incrementing the field in bits 15:12. > >> > >> Signed-off-by: Boris Ostrovsky > > Acked-by: Wei Liu > > > > I trust your expertise in this field. :-) > > > Just in this field? The field is only 4 bits! ;-) > No, it's only one bit! :-P > But this actually fixes a regression that was triggered by recent > hvmloader change on one particular processor that we have in the test farm. Ian, this is a backport candidate and needs to go back as far as possible. The bogus calculation was introduced in 2008 (!). Wei. > > -boris > > > > >> --- > >> tools/libxc/xc_cpuid_x86.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c > >> index 84f4e08..fbbac9e 100644 > >> --- a/tools/libxc/xc_cpuid_x86.c > >> +++ b/tools/libxc/xc_cpuid_x86.c > >> @@ -331,7 +331,8 @@ static void amd_xc_cpuid_policy(xc_interface *xch, > >> * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus > >> one). > >> * Update to reflect vLAPIC_ID = vCPU_ID * 2. > >> */ > >> -regs[2] = ((regs[2] & 0xf000u) + 1) | ((regs[2] & 0xffu) << 1) | > >> 1u; > >> +regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | > >> + ((regs[2] & 0xffu) << 1) | 1u; > >> break; > >> > >> case 0x800a: { > >> -- > >> 1.8.3.1 > >> > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/4] tools: remove systemd xenstore socket definitions
On 22/07/16 18:31, Wei Liu wrote: > Only skim-read this patch, will do proper review later. > > On Fri, Jul 22, 2016 at 05:09:28PM +0200, Juergen Gross wrote: > [...] >> CAMLprim value ocaml_launched_by_systemd(value ignore) >> { >> -CAMLparam1(ignore); >> -CAMLlocal1(ret); >> + CAMLparam1(ignore); >> + CAMLlocal1(ret); >> >> -ret = Val_false; >> + ret = Val_false; >> >> -if (sd_listen_fds(0) > 0) >> -ret = Val_true; >> + if (sd_booted() > 0) >> + ret = Val_true; > > I think this may be problematic. > > sd_booted returns true if system is booted with systemd, but it has no > bearing whether this particular process is launched by systemd. > > IIRC using sd_booted would cause oxenstored thinks it is launched by > systemd even if the user launches it by hand in a shell. That caused > it's initialisation to fail. 81d758afca7c3c1e3ccbd78154b33d64fd7757fb > was written to address that issue. > > So, what would happen if you start oxenstored by hand with your patch > apply? Maybe we can just remove this launched_by_systemd check all > together -- i.e. we always call sd_notify? So you are concerned sd_notify() will be called too often, but you are suggesting to call it always? I don't understand your concerns then. > >> >> -CAMLreturn(ret); >> + CAMLreturn(ret); >> } >> >> CAMLprim value ocaml_sd_notify_ready(value ignore) >> { >> -CAMLparam1(ignore); >> -CAMLlocal1(ret); >> + CAMLparam1(ignore); >> + CAMLlocal1(ret); >> >> -ret = Val_int(0); >> + ret = Val_int(0); >> >> -sd_notify(1, "READY=1"); >> + sd_notify(1, "READY=1"); >> >> -CAMLreturn(ret); >> + CAMLreturn(ret); > > It seems that you have introduced quite a few white space changes. > If you really want to change tabs to spaces, please do that in a > separate patch. Oops, didn't mean to do this. Sorry. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/4] tools: remove systemd xenstore socket definitions
On Fri, Jul 22, 2016 at 08:49:17PM +0200, Juergen Gross wrote: > On 22/07/16 18:31, Wei Liu wrote: > > Only skim-read this patch, will do proper review later. > > > > On Fri, Jul 22, 2016 at 05:09:28PM +0200, Juergen Gross wrote: > > [...] > >> CAMLprim value ocaml_launched_by_systemd(value ignore) > >> { > >> - CAMLparam1(ignore); > >> - CAMLlocal1(ret); > >> + CAMLparam1(ignore); > >> + CAMLlocal1(ret); > >> > >> - ret = Val_false; > >> + ret = Val_false; > >> > >> - if (sd_listen_fds(0) > 0) > >> - ret = Val_true; > >> + if (sd_booted() > 0) > >> + ret = Val_true; > > > > I think this may be problematic. > > > > sd_booted returns true if system is booted with systemd, but it has no > > bearing whether this particular process is launched by systemd. > > > > IIRC using sd_booted would cause oxenstored thinks it is launched by > > systemd even if the user launches it by hand in a shell. That caused > > it's initialisation to fail. 81d758afca7c3c1e3ccbd78154b33d64fd7757fb > > was written to address that issue. > > > > So, what would happen if you start oxenstored by hand with your patch > > apply? Maybe we can just remove this launched_by_systemd check all > > together -- i.e. we always call sd_notify? > > So you are concerned sd_notify() will be called too often, but you are > suggesting to call it always? I don't understand your concerns then. > No, my concern is that you won't be able to start oxenstored from command line manually if you boot with systemd. At least that was the bug that caused me to write that patch. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC v3 09/13] firmware: port built-in section to linker table
This ports built-in firmware to use linker tables, this replaces the custom section solution with a generic solution. This also demos the use of the .rodata (SECTION_RO) linker tables. Tested with 0 built-in firmware, 1 and 2 built-in firmwares successfully. v3: o explicitly include tables.h as we no longer include tables.h from sections.h o use new section_tbl_asmtype() helper on firmware/Makefile to enable having to unfold things on our own. v2: introduced this file in this version of the series Signed-off-by: Luis R. Rodriguez --- arch/x86/kernel/cpu/microcode/core.c | 8 drivers/base/firmware_class.c| 12 ++-- firmware/Makefile| 4 +++- include/asm-generic/vmlinux.lds.h| 7 --- 4 files changed, 13 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index df04b2d033f6..3e7c08d99601 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -91,15 +92,14 @@ static bool __init check_loader_disabled_bsp(void) return *res; } -extern struct builtin_fw __start_builtin_fw[]; -extern struct builtin_fw __end_builtin_fw[]; +DECLARE_LINKTABLE_RO(struct builtin_fw, builtin_fw); bool get_builtin_firmware(struct cpio_data *cd, const char *name) { #ifdef CONFIG_FW_LOADER - struct builtin_fw *b_fw; + const struct builtin_fw *b_fw; - for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) { + LINKTABLE_FOR_EACH(b_fw, builtin_fw) { if (!strcmp(name, b_fw->name)) { cd->size = b_fw->size; cd->data = b_fw->data; diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 22d1760a4278..8fbf03c3e4c2 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -30,6 +30,7 @@ #include #include #include +#include #include @@ -43,15 +44,14 @@ MODULE_LICENSE("GPL"); #ifdef CONFIG_FW_LOADER -extern struct builtin_fw __start_builtin_fw[]; -extern struct builtin_fw __end_builtin_fw[]; +DEFINE_LINKTABLE_RO(struct builtin_fw, builtin_fw); static bool fw_get_builtin_firmware(struct firmware *fw, const char *name, void *buf, size_t size) { - struct builtin_fw *b_fw; + const struct builtin_fw *b_fw; - for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) { + LINKTABLE_FOR_EACH(b_fw, builtin_fw) { if (strcmp(name, b_fw->name) == 0) { fw->size = b_fw->size; fw->data = b_fw->data; @@ -67,9 +67,9 @@ static bool fw_get_builtin_firmware(struct firmware *fw, const char *name, static bool fw_is_builtin_firmware(const struct firmware *fw) { - struct builtin_fw *b_fw; + const struct builtin_fw *b_fw; - for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) + LINKTABLE_FOR_EACH(b_fw, builtin_fw) if (fw->data == b_fw->data) return true; diff --git a/firmware/Makefile b/firmware/Makefile index fa3e81c2a97b..32304a227f09 100644 --- a/firmware/Makefile +++ b/firmware/Makefile @@ -155,6 +155,8 @@ quiet_cmd_fwbin = MK_FW $@ ASM_ALIGN=$(if $(CONFIG_64BIT),3,2); \ PROGBITS=$(if $(CONFIG_ARM),%,@)progbits; \ echo "/* Generated by firmware/Makefile */" > $@;\ + echo "\#include " >>$@;\ + echo "\#include " >>$@;\ echo ".section .rodata" >>$@;\ echo ".p2align $${ASM_ALIGN}" >>$@;\ echo "_fw_$${FWSTR}_bin:" >>$@;\ @@ -164,7 +166,7 @@ quiet_cmd_fwbin = MK_FW $@ echo ".p2align $${ASM_ALIGN}" >>$@;\ echo "_fw_$${FWSTR}_name:">>$@;\ echo ".string \"$$FWNAME\"" >>$@;\ - echo ".section .builtin_fw,\"a\",$${PROGBITS}">>$@;\ + echo "section_tbl_asmtype(SECTION_RODATA, builtin_fw, SECTION_ORDER_ANY, a,$${PROGBITS})" >>$@;\ echo ".p2align $${ASM_ALIGN}" >>$@;\ echo "$${ASM_WORD} _fw_$${FWSTR}_name">>$@;\ echo "$${ASM_WORD} _fw_$${FWSTR}_bin" >>$@;\ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index b3a9cd7bc947..8e31a4454841 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -313,13 +313,6 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late)
[Xen-devel] [RFC v3 01/13] x86: remove LTO_REFERENCE_INITCALL()
The setup for LTO never made it upstream, and although this has some users, this is now really old stuff for a gcc 4.7 LTO problem. We know that at least LTO_REFERENCE_INITCALL() work around can be removed if LTO is not supported on v4.7 anymore. As per Andi the DISABLE_LTO and LTO_CFLAGS are still neeeded though. v3: added to this series, removing LTO_REFERENCE_INITCALL() justifies that other future similar macros do not need a respective LTO_REFERENCE_INITCALL() on them therefore simplifying new code. Acked-by: Andi Kleen Signed-off-by: Luis R. Rodriguez --- include/linux/init.h | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/include/linux/init.h b/include/linux/init.h index 1e5c131d5c9a..aa662ad80d9c 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -151,23 +151,6 @@ extern bool initcall_debug; #ifndef __ASSEMBLY__ -#ifdef CONFIG_LTO -/* Work around a LTO gcc problem: when there is no reference to a variable - * in a module it will be moved to the end of the program. This causes - * reordering of initcalls which the kernel does not like. - * Add a dummy reference function to avoid this. The function is - * deleted by the linker. - */ -#define LTO_REFERENCE_INITCALL(x) \ - ; /* yes this is needed */ \ - static __used __exit void *reference_##x(void) \ - { \ - return &x; \ - } -#else -#define LTO_REFERENCE_INITCALL(x) -#endif - /* initcalls are now grouped by functionality into separate * subsections. Ordering inside the subsections is determined * by link order. @@ -180,8 +163,7 @@ extern bool initcall_debug; #define __define_initcall(fn, id) \ static initcall_t __initcall_##fn##id __used \ - __attribute__((__section__(".initcall" #id ".init"))) = fn; \ - LTO_REFERENCE_INITCALL(__initcall_##fn##id) + __attribute__((__section__(".initcall" #id ".init"))) = fn; /* * Early initcalls run before initializing SMP. -- 2.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC v3 10/13] jump_label: port __jump_table to linker tables
Move the __jump_table from the a custom section solution to a generic solution, this avoiding extra vmlinux.lds.h customizations. This also demos the use of the .data (SECTION_DATA) linker table and of the shared asm call push_section_tbl(). Built-in kernel functionality was tested with CONFIG_STATIC_KEYS_SELFTEST. Moduler kernel functionality was tested with CONFIG_TEST_STATIC_KEYS. Both work as expected. Since __jump_table sections are also supported per module this also required expanding module-common.lds.S to capture and fold all .data.tlb.__jump_table.* onto the the section __jump_table -- in this case for modules need to keep a reference in place, given the alternative is to use DEFINE_LINKTABLE(struct jump_entry, __jump_table) per module -- and later through macro hacks instantiate the jump entries per module upon init. This is doable but we'd loose out on the sorting of the table using the linker, to sort we'd always still need to expand the module common linker script. An alternative mechanism is possible which would make these custom module sections extensions dynamic without requiring manual changes, this however is best done later through a separate evolution once linker tables are in place. A careful reviewer may note that some architectures use "\n\t" to separate asm code, while others just use a new line. Upon review last time it was deemed reasonable to for all architectures to juse use "\n", this is defined as ASM_CMD_SEP, and if an architecture needs to override they can do so on their architecture sections.h prior to including asm-generic/sections.h v3: o More elaborate tests performed o first modular support use case, module tested was CONFIG_TEST_STATIC_KEYS (lib/test_static_keys.ko), this required us to extend module-common.lds.S o use generic push_section_tbl_any() for all architectures o Makes use of ASM_CMD_SEP to enable architectures to override later if needed o guard tables.h inclusion and table definition with __KERNEL__ v2: introduced in this series Signed-off-by: Luis R. Rodriguez --- arch/arm/include/asm/jump_label.h | 6 -- arch/arm64/include/asm/jump_label.h | 6 -- arch/mips/include/asm/jump_label.h| 6 -- arch/powerpc/include/asm/jump_label.h | 8 +--- arch/s390/include/asm/jump_label.h| 6 -- arch/sparc/include/asm/jump_label.h | 6 -- arch/x86/include/asm/jump_label.h | 10 ++ include/asm-generic/vmlinux.lds.h | 4 include/linux/jump_label.h| 10 ++ kernel/jump_label.c | 17 ++--- scripts/module-common.lds.S | 5 + tools/objtool/special.c | 8 +++- 12 files changed, 59 insertions(+), 33 deletions(-) diff --git a/arch/arm/include/asm/jump_label.h b/arch/arm/include/asm/jump_label.h index 34f7b6980d21..960135a7b88e 100644 --- a/arch/arm/include/asm/jump_label.h +++ b/arch/arm/include/asm/jump_label.h @@ -1,6 +1,8 @@ #ifndef _ASM_ARM_JUMP_LABEL_H #define _ASM_ARM_JUMP_LABEL_H +#include + #ifndef __ASSEMBLY__ #include @@ -12,7 +14,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran { asm_volatile_goto("1:\n\t" WASM(nop) "\n\t" -".pushsection __jump_table, \"aw\"\n\t" +push_section_tbl_any(SECTION_DATA, __jump_table, aw) ".word 1b, %l[l_yes], %c0\n\t" ".popsection\n\t" : : "i" (&((char *)key)[branch]) : : l_yes); @@ -26,7 +28,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool { asm_volatile_goto("1:\n\t" WASM(b) " %l[l_yes]\n\t" -".pushsection __jump_table, \"aw\"\n\t" +push_section_tbl_any(SECTION_DATA, __jump_table, aw) ".word 1b, %l[l_yes], %c0\n\t" ".popsection\n\t" : : "i" (&((char *)key)[branch]) : : l_yes); diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h index 1b5e0e843c3a..aa52cd2607e3 100644 --- a/arch/arm64/include/asm/jump_label.h +++ b/arch/arm64/include/asm/jump_label.h @@ -19,6 +19,8 @@ #ifndef __ASM_JUMP_LABEL_H #define __ASM_JUMP_LABEL_H +#include + #ifndef __ASSEMBLY__ #include @@ -29,7 +31,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { asm goto("1: nop\n\t" -".pushsection __jump_table, \"aw\"\n\t" +push_section_tbl_any(SECTION_DATA, __jump_table, aw) ".align 3\n\t" ".quad 1b, %l[l_yes], %c0\n\t" ".popsection\n\t" @@ -43,7 +45,7 @@ l_yes: static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { asm goto("1: b %l[l_yes]\n\t" -".pushsection __jump_table, \"aw\"\n\t" +push_section_tbl_any(SECTION_DATA, __jump_table, aw)
[Xen-devel] [RFC v3 08/13] firmware/Makefile: force recompilation if makefile changes
If you modify the target asm we currently do not force the recompilation of the firmware files. The target asm is in the firmware/Makefile, peg this file as a dependency to require re-compilation of firmware targets when the asm changes. v3: introduced in this series Signed-off-by: Luis R. Rodriguez --- firmware/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/firmware/Makefile b/firmware/Makefile index e297e1b52636..fa3e81c2a97b 100644 --- a/firmware/Makefile +++ b/firmware/Makefile @@ -176,7 +176,8 @@ quiet_cmd_fwbin = MK_FW $@ wordsize_deps := $(wildcard include/config/64bit.h include/config/32bit.h \ include/config/ppc32.h include/config/ppc64.h \ include/config/superh32.h include/config/superh64.h \ - include/config/x86_32.h include/config/x86_64.h) + include/config/x86_32.h include/config/x86_64.h \ + firmware/Makefile) $(patsubst %,$(obj)/%.gen.S, $(fw-shipped-y)): %: $(wordsize_deps) $(call cmd,fwbin,$(patsubst %.gen.S,%,$@)) -- 2.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC v3 07/13] tables.h: add linker table support
A linker table is a data structure that is stitched together from items in multiple object files. Linux has historically implicitly used linker tables for ages, however they were all built in an adhoc manner which requires linker script modifications, per architecture. This adds a general linker table solution so that a new linker table can be implemented by changing C code only. The Linux linker table was originally based on Michael Brown's iPXE's linker table solution but has been significantly modified to fit Linux's use in its integration. The same philosophy is borrowed, extended and further simplified: Linker tables enable an extremely light weight linker build time solution for feature ordering and selection, this can help to both simplify init sequences in a generic fashion and helps avoiding code bit-rotting when desirable. Bit rotting avoidance is enabled by *granting the option* to force compilation of code and only enable linking object code in when specific features have been enabled. It accomplishes this by using linker sections, the lightweight feature ordering of code is enabled by taking advantage of the old binutils ld SORT() on features specific sections. Contrary to iPXE's solution, which strives to force compilation of all features, Linux' solution strives to encourage usage of linker tables to force compilation of code *only on designated code*. Linker tables can be used without requiring one to force code compilation of features that use it though, to take advantage of the simplification of init sequences. Linux code that uses linker tables *and* wishes to always require compilation but selectively linking must use new kbuild target object that helps accomplishes this, table-y. Linux code that uses linker tables but does not want to require forcing compilation can use the good 'ol obj-y targets. Used commit 67a10ef000cb7 from iPXE upstream [0] as the starting point for development and evaluation of Linux's integration, further changes made and evaluation of different linker table options for Linux are available on the linker-table userspace tree [1]. [0] git://git.ipxe.org/ipxe.git [1] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linker-tables.git/ v3: o addressed initial modular support test cases o added generic asm macros so linker tables can be used in asm code / C asm calls o section ranges are now split up into their own set of files o use asm/sections.h instead of linux/sections.h for the linker script o add a sections.h file for each architecture that was missing one, this is needed now as we'll be relying on sections.h for custom section types in code rather than custom architecture specific linker script hacks. o full rewrite at this point, decided to pick copyleft-next license for this work v2: o modified completely to match feedback by community, made equivalent modifications to userspace solution. This is pretty much a complete rewrite of how we present and use linker tables. By using standard sections we no longer have to make custom linker script extensions for each new linker table solution, you just pick a linker table type by section type. o extend documention considerably, including use of kdoc o drop ICC hacks per popular request to ignore such issues for now o use sections.h - this lets us streamline a clean use case of well documented sections. To help further with this make use of SECTION_TBL() to allow use of these in code and SECTION_TBL_ALL() on linker scripts, as well as SECTION_TBL_ALL_STR() on relocs.c when needed. Cc: Michael Brown Signed-off-by: Luis R. Rodriguez --- Documentation/DocBook/Makefile | 2 +- Documentation/DocBook/linker-tables.tmpl | 166 + Documentation/kbuild/makefiles.txt | 19 + arch/alpha/include/asm/tables.h | 6 + arch/arc/include/asm/tables.h| 6 + arch/arm/include/asm/tables.h| 6 + arch/arm64/include/asm/tables.h | 6 + arch/avr32/include/asm/tables.h | 6 + arch/blackfin/include/asm/tables.h | 6 + arch/c6x/include/asm/tables.h| 6 + arch/cris/include/asm/tables.h | 6 + arch/frv/include/asm/tables.h| 6 + arch/h8300/include/asm/tables.h | 6 + arch/hexagon/include/asm/tables.h| 6 + arch/ia64/include/asm/tables.h | 6 + arch/m32r/include/asm/tables.h | 6 + arch/m68k/include/asm/tables.h | 6 + arch/metag/include/asm/tables.h | 6 + arch/microblaze/include/asm/tables.h | 6 + arch/mips/include/asm/tables.h | 6 + arch/mn10300/include/asm/tables.h| 6 + arch/nios2/include/asm/tables.h | 6 + arch/openrisc/include/asm/tables.h | 6 + arch/parisc/include/asm/tables.h | 6 + arch/powerpc/include/asm/tables.h| 6 + arch/s390/include/asm/tables.h | 6 + arch/score/include/asm/tables.h | 6 + ar
[Xen-devel] [RFC v3 02/13] dell-smo8800: include uaccess.h
sections.h is currently included and it provides dell-smo8800 with what it needs, an upcoming change will decouple uaccess.h from sections.h. This driver needs to explicitly require uaccess.h before this change. v3: new to this series -- needed due to collateral of the split of old linker tables from tables.h into 3 files: sections.h, ranges.h and tables.h Signed-off-by: Luis R. Rodriguez --- drivers/platform/x86/dell-smo8800.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/platform/x86/dell-smo8800.c b/drivers/platform/x86/dell-smo8800.c index 0aec4fd4c48e..37e646034ef8 100644 --- a/drivers/platform/x86/dell-smo8800.c +++ b/drivers/platform/x86/dell-smo8800.c @@ -24,6 +24,7 @@ #include #include #include +#include struct smo8800_device { u32 irq; /* acpi device irq */ -- 2.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC v3 06/13] ranges.h: add helpers to build and identify Linux section ranges
Section ranges are on one of the types of custom sections types used in Linux. This provides a series of helpers for defining them and using them. Most importantly this also enables us to avoid modifying the linker script when we add new section range. It turns out a lot of custom sections are actually section ranges, and these are typically spelled out in their architecture specific asm/sections.h file -- we anable architectures to override what asm is used for section ranges but start by defult trusting the asm-generic version all around. v3: new in this series, uses copyleft-next Signed-off-by: Luis R. Rodriguez --- arch/alpha/include/asm/ranges.h | 6 arch/arc/include/asm/ranges.h| 6 arch/arm/include/asm/ranges.h| 6 arch/arm64/include/asm/ranges.h | 6 arch/avr32/include/asm/ranges.h | 6 arch/blackfin/include/asm/ranges.h | 6 arch/c6x/include/asm/ranges.h| 6 arch/cris/include/asm/ranges.h | 6 arch/frv/include/asm/ranges.h| 6 arch/h8300/include/asm/ranges.h | 6 arch/hexagon/include/asm/ranges.h| 6 arch/ia64/include/asm/ranges.h | 6 arch/m32r/include/asm/ranges.h | 6 arch/m68k/include/asm/ranges.h | 6 arch/metag/include/asm/ranges.h | 6 arch/microblaze/include/asm/ranges.h | 6 arch/mips/include/asm/ranges.h | 6 arch/mn10300/include/asm/ranges.h| 6 arch/nios2/include/asm/ranges.h | 6 arch/openrisc/include/asm/ranges.h | 6 arch/parisc/include/asm/ranges.h | 6 arch/powerpc/include/asm/ranges.h| 6 arch/s390/include/asm/ranges.h | 6 arch/score/include/asm/ranges.h | 6 arch/sh/include/asm/ranges.h | 6 arch/sparc/include/asm/ranges.h | 6 arch/tile/include/asm/ranges.h | 6 arch/um/include/asm/ranges.h | 6 arch/unicore32/include/asm/ranges.h | 6 arch/x86/include/asm/ranges.h| 6 arch/xtensa/include/asm/ranges.h | 6 include/asm-generic/ranges.h | 70 include/linux/ranges.h | 54 33 files changed, 310 insertions(+) create mode 100644 arch/alpha/include/asm/ranges.h create mode 100644 arch/arc/include/asm/ranges.h create mode 100644 arch/arm/include/asm/ranges.h create mode 100644 arch/arm64/include/asm/ranges.h create mode 100644 arch/avr32/include/asm/ranges.h create mode 100644 arch/blackfin/include/asm/ranges.h create mode 100644 arch/c6x/include/asm/ranges.h create mode 100644 arch/cris/include/asm/ranges.h create mode 100644 arch/frv/include/asm/ranges.h create mode 100644 arch/h8300/include/asm/ranges.h create mode 100644 arch/hexagon/include/asm/ranges.h create mode 100644 arch/ia64/include/asm/ranges.h create mode 100644 arch/m32r/include/asm/ranges.h create mode 100644 arch/m68k/include/asm/ranges.h create mode 100644 arch/metag/include/asm/ranges.h create mode 100644 arch/microblaze/include/asm/ranges.h create mode 100644 arch/mips/include/asm/ranges.h create mode 100644 arch/mn10300/include/asm/ranges.h create mode 100644 arch/nios2/include/asm/ranges.h create mode 100644 arch/openrisc/include/asm/ranges.h create mode 100644 arch/parisc/include/asm/ranges.h create mode 100644 arch/powerpc/include/asm/ranges.h create mode 100644 arch/s390/include/asm/ranges.h create mode 100644 arch/score/include/asm/ranges.h create mode 100644 arch/sh/include/asm/ranges.h create mode 100644 arch/sparc/include/asm/ranges.h create mode 100644 arch/tile/include/asm/ranges.h create mode 100644 arch/um/include/asm/ranges.h create mode 100644 arch/unicore32/include/asm/ranges.h create mode 100644 arch/x86/include/asm/ranges.h create mode 100644 arch/xtensa/include/asm/ranges.h create mode 100644 include/asm-generic/ranges.h create mode 100644 include/linux/ranges.h diff --git a/arch/alpha/include/asm/ranges.h b/arch/alpha/include/asm/ranges.h new file mode 100644 index ..b7bc4107bc75 --- /dev/null +++ b/arch/alpha/include/asm/ranges.h @@ -0,0 +1,6 @@ +#ifndef _ASM_ALPHA_RANGES_H +#define _ASM_ALPHA_RANGES_H + +#include + +#endif /* _ASM_ALPHA_RANGES_H */ diff --git a/arch/arc/include/asm/ranges.h b/arch/arc/include/asm/ranges.h new file mode 100644 index ..a761243bf79f --- /dev/null +++ b/arch/arc/include/asm/ranges.h @@ -0,0 +1,6 @@ +#ifndef _ASM_ARC_RANGES_H +#define _ASM_ARC_RANGES_H + +#include + +#endif /* _ASM_ARC_RANGES_H */ diff --git a/arch/arm/include/asm/ranges.h b/arch/arm/include/asm/ranges.h new file mode 100644 index ..fb059e205f90 --- /dev/null +++ b/arch/arm/include/asm/ranges.h @@ -0,0 +1,6 @@ +#ifndef _ASM_ARM_RANGES_H +#define _ASM_ARM_RANGES_H + +#include + +#endif /* _ASM_ARM_RANGES_H */ diff --git a/arch/arm64/include/asm/ranges.h b/arch/arm64/include/asm/ranges.h new fil
[Xen-devel] [RFC v3 03/13] scripts/module-common.lds: enable generation
scripts/module-common.lds is currently pretty static, in the future this may change and we will want access to kernel macros to help expands certain areas. To get access to use macros we need to generate module-common.lds from module-common.lds.S, for now though only enable the generation. We'll later expand on this as needed. Since this file is now generated add it to .gitignore. v3: new to this series Signed-off-by: Luis R. Rodriguez --- .gitignore | 2 ++ Makefile | 6 +- scripts/Makefile.modpost | 2 +- scripts/{module-common.lds => module-common.lds.S} | 0 4 files changed, 8 insertions(+), 2 deletions(-) rename scripts/{module-common.lds => module-common.lds.S} (100%) diff --git a/.gitignore b/.gitignore index 2be25f771bd8..14a94d939a6f 100644 --- a/.gitignore +++ b/.gitignore @@ -113,3 +113,5 @@ all.config # Kdevelop4 *.kdev4 + +scripts/module-common.lds diff --git a/Makefile b/Makefile index caa33e007a8c..c56cca906769 100644 --- a/Makefile +++ b/Makefile @@ -408,6 +408,10 @@ KBUILD_AFLAGS_MODULE := -DMODULE KBUILD_CFLAGS_MODULE := -DMODULE KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds +$(srctree)/scripts/module-common.lds: $(srctree)/scripts/module-common.lds.S + $(Q)$(CC) $(CFLAGS) $(INCLUDES) $(LINUXINCLUDE) -E -P \ + -D__ASSEMBLY__ -DLINKER_SCRIPT -o $@ $< + # Read KERNELRELEASE from include/config/kernel.release (if it exists) KERNELRELEASE = $(shell cat include/config/kernel.release 2> /dev/null) KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if $(SUBLEVEL),.$(SUBLEVEL)))$(EXTRAVERSION) @@ -1178,7 +1182,7 @@ all: modules # using awk while concatenating to the final file. PHONY += modules -modules: $(vmlinux-dirs) $(if $(KBUILD_BUILTIN),vmlinux) modules.builtin +modules: $(srctree)/scripts/module-common.lds $(vmlinux-dirs) $(if $(KBUILD_BUILTIN),vmlinux) modules.builtin $(Q)$(AWK) '!x[$$0]++' $(vmlinux-dirs:%=$(objtree)/%/modules.order) > $(objtree)/modules.order @$(kecho) ' Building modules, stage 2.'; $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost index 1366a94b6c39..2d8aff7735d6 100644 --- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -121,7 +121,7 @@ quiet_cmd_ld_ko_o = LD [M] $@ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ -o $@ $(filter-out FORCE,$^) -$(modules): %.ko :%.o %.mod.o FORCE +$(modules): %.ko : $(srctree)/scripts/module-common.lds %.o %.mod.o FORCE $(call if_changed,ld_ko_o) targets += $(modules) diff --git a/scripts/module-common.lds b/scripts/module-common.lds.S similarity index 100% rename from scripts/module-common.lds rename to scripts/module-common.lds.S -- 2.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC v3 13/13] kprobes: port blacklist kprobes to linker table
kprobe makes use of two sections, the one dealing with the actual kprobes was recently ported using the standard section range API. The blacklist functionality of kprobes is still using a custom section and declaring its custom section using the linker script as follows: type Linux-section custom section name beginend table .init.data_kprobe_blacklist__start_kprobe_blacklist __stop_kprobe_blacklist This ports the _kprobe_blacklist custom section to the standard Linux linker table API allowing us remove all the custom blacklist kprobe section declarations from the linker script. This has been tested by trying to register a kprobe on a blacklisted symbol (these are declared with NOKPROBE_SYMBOL()), and confirms that this fails to work as expected. This was tested with: # insmod samples/kprobes/kprobe_example.ko symbol="get_kprobe" This fails to load as expected with: insmod: ERROR: could not insert module samples/kprobes/kprobe_example.ko: Invalid parameters v3: this patch was introduced in this series Signed-off-by: Luis R. Rodriguez --- include/asm-generic/vmlinux.lds.h | 10 -- include/linux/kprobes.h | 5 +++-- kernel/kprobes.c | 11 --- 3 files changed, 7 insertions(+), 19 deletions(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 1664050e6560..0e4df8c61c18 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -113,15 +113,6 @@ #define BRANCH_PROFILE() #endif -#ifdef CONFIG_KPROBES -#define KPROBE_BLACKLIST() . = ALIGN(8); \ - VMLINUX_SYMBOL(__start_kprobe_blacklist) = .; \ - *(_kprobe_blacklist) \ - VMLINUX_SYMBOL(__stop_kprobe_blacklist) = .; -#else -#define KPROBE_BLACKLIST() -#endif - #ifdef CONFIG_EVENT_TRACING #define FTRACE_EVENTS(). = ALIGN(8); \ VMLINUX_SYMBOL(__start_ftrace_events) = .; \ @@ -519,7 +510,6 @@ *(SECTION_INIT_RODATA) \ FTRACE_EVENTS() \ TRACE_SYSCALLS()\ - KPROBE_BLACKLIST() \ MEM_DISCARD(init.rodata)\ CLK_OF_TABLES() \ RESERVEDMEM_OF_TABLES() \ diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 3f46b282a3f9..c9bb9caef70c 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -43,9 +43,11 @@ #ifdef CONFIG_KPROBES #include +#include #include DECLARE_SECTION_RANGE(kprobes); +DECLARE_LINKTABLE(unsigned long, _kprobe_blacklist); /* kprobe_status settings */ #define KPROBE_HIT_ACTIVE 0x0001 @@ -490,8 +492,7 @@ static inline int enable_jprobe(struct jprobe *jp) * by using this macro. */ #define __NOKPROBE_SYMBOL(fname) \ -static unsigned long __used\ - __attribute__((section("_kprobe_blacklist"))) \ +static LINKTABLE_INIT_DATA(_kprobe_blacklist, all) \ _kbl_addr_##fname = (unsigned long)fname; #define NOKPROBE_SYMBOL(fname) __NOKPROBE_SYMBOL(fname) #else diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 387605682622..4801aa3b4adf 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2053,14 +2053,13 @@ NOKPROBE_SYMBOL(dump_kprobe); * since a kprobe need not necessarily be at the beginning * of a function. */ -static int __init populate_kprobe_blacklist(unsigned long *start, -unsigned long *end) +static int __init populate_kprobe_blacklist(void) { unsigned long *iter; struct kprobe_blacklist_entry *ent; unsigned long entry, offset = 0, size = 0; - for (iter = start; iter < end; iter++) { + LINKTABLE_FOR_EACH(iter, _kprobe_blacklist) { entry = arch_deref_entry_point((void *)*iter); if (!kernel_text_address(entry) || @@ -2125,8 +2124,7 @@ static struct notifier_block kprobe_module_nb = { }; /* Markers of _kprobe_blacklist section */ -extern unsigned long __start_kprobe_blacklist[]; -extern unsigned long __stop_kprobe_blacklist[]; +DEFINE_LINKTABLE_INIT_DATA(unsigned long, _kprobe_blacklist); /* Actual kprobes section range */ DEFINE_SECTION_RANGE(kprobes, SECTION_TEXT); @@ -2143,8 +2141,7 @@ static int __init init_kprobes(void) raw_spin_lock_init(&(kretprobe_table_locks[i].lock)); } - err = populate_kprobe_blacklist(__start_kprobe_blacklist, - __stop_kprobe_blacklist); + err = populate_kprobe
[Xen-devel] [RFC v3 05/13] sections.h: add sections header to collect all section info
Linux makes extensive use of custom ELF header sections, documentation for these are well scatterred. Unify this documentation in a central place and provide helpers to build custom Linux sections. We are now generalizing sections.h in code to enable avoiding modifying the linker when we want to add new custom Linux sections but to make this generally useful we need to ensure all architectures also have their own asm/sections.h, this is now also used by the custom series of helpers to an architecture's linker script: asm-generic/vmlinux.lds.h. v3: o add missing sections.h for architectures that did not have it o move generic sections to asm-generic/sections.h o add generic asm helpers section_type(), section_type_asmtype(), push_section_type() -- these helpers enable easy use for for later declaring and using of custom linux sections using more standard APIs in both C code, asm code (C asm calls, or asm files), enabling future standardized section types to be more immediately accessible to asm code, not just C code. Note for ASM_CMD_SEP we use by default "\n", architectures needed to override can do so on their own sections.h prior to inclusion of asm-generic/sections.h Signed-off-by: Luis R. Rodriguez --- Documentation/DocBook/Makefile| 3 +- Documentation/DocBook/sections.tmpl | 112 + arch/alpha/include/asm/sections.h | 6 + arch/arm64/include/asm/sections.h | 6 + arch/avr32/include/asm/sections.h | 6 + arch/cris/include/asm/sections.h | 6 + arch/h8300/include/asm/sections.h | 6 + arch/hexagon/include/asm/sections.h | 6 + arch/m32r/include/asm/sections.h | 6 + arch/m68k/include/asm/sections.h | 6 + arch/metag/include/asm/sections.h | 6 + arch/mips/include/asm/sections.h | 6 + arch/mn10300/include/asm/sections.h | 6 + arch/nios2/include/asm/sections.h | 6 + arch/openrisc/include/asm/sections.h | 6 + arch/score/include/asm/sections.h | 6 + arch/unicore32/include/asm/sections.h | 6 + arch/xtensa/include/asm/sections.h| 6 + include/asm-generic/sections.h| 300 ++ include/asm-generic/vmlinux.lds.h | 27 +-- include/linux/sections.h | 123 ++ 21 files changed, 647 insertions(+), 14 deletions(-) create mode 100644 Documentation/DocBook/sections.tmpl create mode 100644 arch/alpha/include/asm/sections.h create mode 100644 arch/arm64/include/asm/sections.h create mode 100644 arch/avr32/include/asm/sections.h create mode 100644 arch/cris/include/asm/sections.h create mode 100644 arch/h8300/include/asm/sections.h create mode 100644 arch/hexagon/include/asm/sections.h create mode 100644 arch/m32r/include/asm/sections.h create mode 100644 arch/m68k/include/asm/sections.h create mode 100644 arch/metag/include/asm/sections.h create mode 100644 arch/mips/include/asm/sections.h create mode 100644 arch/mn10300/include/asm/sections.h create mode 100644 arch/nios2/include/asm/sections.h create mode 100644 arch/openrisc/include/asm/sections.h create mode 100644 arch/score/include/asm/sections.h create mode 100644 arch/unicore32/include/asm/sections.h create mode 100644 arch/xtensa/include/asm/sections.h create mode 100644 include/linux/sections.h diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 8e6dd4b14314..1b2ac4731418 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -17,7 +17,8 @@ DOCBOOKS := z8530book.xml device-drivers.xml \ 80211.xml debugobjects.xml sh.xml regulator.xml \ alsa-driver-api.xml writing-an-alsa-driver.xml \ tracepoint.xml media_api.xml w1.xml \ - writing_musb_glue_layer.xml crypto-API.xml iio.xml + writing_musb_glue_layer.xml crypto-API.xml iio.xml \ + sections.xml include Documentation/DocBook/media/Makefile diff --git a/Documentation/DocBook/sections.tmpl b/Documentation/DocBook/sections.tmpl new file mode 100644 index ..2d14eb20e99c --- /dev/null +++ b/Documentation/DocBook/sections.tmpl @@ -0,0 +1,112 @@ + +http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"; []> + + +Linux ELF sections + + Explains Linux ELF sections + + + + 2016 + Luis R. Rodriguez + + + + +Luis +Rodriguez + + mcg...@kernel.org + + + + + + +This documentation is free software; you can redistribute +it and/or modify it under the terms of the GNU General Public +License version 2 as published by the Free Software Foundation. + + +This documentation is distributed in the hope that it will be +useful, but WITHOUT ANY WARRANTY; without even the implied +warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +See the GNU General Public License for more detail