Re: [Xen-devel] [libvirt test] 140186: regressions - FAIL
On 8/16/19 7:01 AM, osstest service owner wrote: > flight 140186 libvirt real [real] > http://logs.test-lab.xenproject.org/osstest/logs/140186/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > build-amd64-libvirt 6 libvirt-buildfail REGR. vs. > 139829 > build-i386-libvirt6 libvirt-buildfail REGR. vs. > 139829 > build-arm64-libvirt 6 libvirt-buildfail REGR. vs. > 139829 > build-armhf-libvirt 6 libvirt-buildfail REGR. vs. > 139829 Should be fixed now by commit 3b7c5ab9 https://libvirt.org/git/?p=libvirt.git;a=commit;h=3b7c5ab983f4655ae02b8af4517d89839530ee5f Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [libvirt] domXML modeling question
Adding xen-devel to cc in case anyone there wants to comment on my latest proposal... On 2/20/19 5:20 PM, Jim Fehlig wrote: There have been a few requests [1][2] to support Xen's max_grant_frames setting in libvirt domXML, but I'm not quite sure how to model it. The documentation [3] on this setting states: Specify the maximum number of grant frames the domain is allowed to have. This value controls how many pages the domain is able to grant access to for other domains, needed e.g. for the operation of paravirtualized devices. The default is settable via xl.conf(5). I've sent a patch to introduce an analogous default in the libvirt libxl driver https://www.redhat.com/archives/libvir-list/2019-March/msg00123.html It smells of a setting, e.g. the amount of memory a domain can share, but doesn't map to any of the existing settings. A new subelement doesn't feel right. Does anyone suggest a better way of modeling max_grant_frames? After discussing the max_grant_frames setting a bit more with Juergen I had the idea to model it as IO buffer space (or DMA space) of a xenbus "controller". All PV devices in the guest connect to the xenbus controller and make use of the available I/O buffer space. Guests with more PV devices requiring more buffer can increase the space on the xenbus controller device. One small wrinkle in this idea is that we currently don't model xenbus in libvirt. I'd need to add support for a new xenbus controller type and start implicitly creating it when creating guests with PV devices, similar to auto-creation of controllers in the qemu driver. Also, there is no existing controller setting for specifying buffer space. Perhaps a 'ram' attribute could be added, similar to specifying memory for devices? E.g. Any opinion on this approach? Or other ideas for modeling this setting in libvirt? Regards, Jim Another option I considered is setting the value based on number of PV devices, but I think that flies in the face of libvirt's policy of not dictating policy. Regardless of domain config modeling I can work on a driver-wide setting in libxl.conf, similar to Xen's xl.conf(5) global. Regards, Jim [1] https://www.redhat.com/archives/libvir-list/2018-April/msg00216.html [2] https://www.redhat.com/archives/libvirt-users/2019-January/msg00011.html [3] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xl.cfg.5.pod.in;h=ad81af1ed8cc983c76b5ec2c3aa02e28f042cc63;hb=HEAD#l569 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH for-4.13 v2 1/9] libxl: Offer API versions 0x040700 and 0x040800
On 10/10/19 9:11 AM, Ian Jackson wrote: > According to git log -G: > > 0x040700 was introduced in 304400459ef0 (aka 4.7.0-rc1~481) >"tools/libxl: rename remus device to checkpoint device" > > 0x040800 was introduced in 57f8b13c7240 (aka 4.8.0-rc1~437) >"libxl: memory size in kb requires 64 bit variable" > > It is surprising that no-one noticed this. I am now noticing it :-(. As Anthony noted in V1, libvirt uses LIBXL_API_VERSION and currently has it set to 0x040500. I'm attempting to bump libvirt's minimum supported Xen version to 4.9.0 and for that would use 0x040800, but it's not possible without this commit backported through 4.9 and picked up and released by all the downstreams. Any ideas on how to use the APIs changes through 0x040800, but avoid the ones introduced in 0x041300 would be much appreciated. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 1/2] ts-libvirt-build: Provide PKG_CONFIG_PATH
On 11/12/19 5:09 AM, Ian Jackson wrote: > In osstest we do not install the xen tree in /usr/local because the > build environment is shared with many different build jobs which might > be using different versions of Xen. We put it in a job-specific > directory in ~osstest on the build host, and set environment variables > to ensure that it all gets picked up. > > Recent versions of libvirt insist on finding xenlight.pc; otherwise > they disable libxl support. So we must add a PKG_CONFIG_PATH setting. Sorry. There was a hack to workaround a fedora 28 bug, but now that it is EOL the hack was removed https://libvirt.org/git/?p=libvirt.git;a=commit;h=18981877d2e20390a79d068861a24e716f8ee422 > (In all cases, contrary to the usual protocol for path-like variables, > we do not append but instead simply set the variable. This is OK > because this is an osstest build script run via ssh to the build host, > so the variables won't have been set already.) > > CC: Jim Fehlig > Signed-off-by: Ian Jackson > --- > ts-libvirt-build | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/ts-libvirt-build b/ts-libvirt-build > index bc08190a..2a363f43 100755 > --- a/ts-libvirt-build > +++ b/ts-libvirt-build > @@ -60,6 +60,7 @@ sub config() { > cd libvirt > CFLAGS="-g -I$xenprefix/include/" \\ > LDFLAGS="-g -L$xenprefix/lib/ -Wl,-rpath-link=$xenprefix/lib/" \\ > +PKG_CONFIG_PATH="$xenprefix/lib/pkgconfig/" \\ > GNULIB_SRCDIR=$builddir/libvirt/$gnulib->{Path} \\ > ./autogen.sh --no-git \\ >--with-libxl --without-xen --without-xenapi > --without-selinux \\ Unrelated, but the legacy xen and xenapi drivers have been removed so the --without-{xen,xenapi} options could be dropped. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 2/2] ts-libvirt-build: Do an out-of-tree build
On 11/12/19 5:09 AM, Ian Jackson wrote: > Recent versions of libvirt do not support in-tree builds (!) I assumed libvirt's gradual move from autotools to meson would affect OSSTEST, but later rather than sooner. Sorry for not mentioning it earlier, but now you have been warned that libvirt is moving to meson :-). Meson has a strict separation between source and build directories and some preparatory patches were pushed that force srcdir != builddir https://www.redhat.com/archives/libvir-list/2019-October/msg01681.html Daniel posted a note about this change yesterday https://www.redhat.com/archives/libvir-list/2019-November/msg00299.html I didn't read libvirt mail yesterday otherwise I would have forwarded that to xen-devel. I need to be more proactive with libvirt changes that might affect OSSTEST... Regards, Jim > > Cope with this by always building in a subdirectory `build' (a > subdirectory of the source tree); this is the arrangement which the > libvirt upstream messages and documentation now seem to recommend (at > least where things have been updated). > > I compared the differences in build output between the results of this > branch and a previous passing xen-unstable flight. The libvirt > library version increased and a file >usr/local/share/libvirt/cpu_map/arm_features.xml > appeared. I think this is just due to changes in the libvirt version, > 2cff65e4c60e..70218e10bcde, in particular 0de541bfc575 >cpu_map: Ship arm_features.xml > > I also tested that a test job, built with current libvirt and these > osstest changes, passes as expected. > > CC: Jim Fehlig > Signed-off-by: Ian Jackson > Tested-by: Ian Jackson > --- > ts-libvirt-build | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/ts-libvirt-build b/ts-libvirt-build > index 2a363f43..e799f003 100755 > --- a/ts-libvirt-build > +++ b/ts-libvirt-build > @@ -58,11 +58,13 @@ sub config() { > my $gnulib = submodule_find($submodules, "gnulib"); > target_cmd_build($ho, 3600, $builddir, < cd libvirt > + mkdir build > + cd build > CFLAGS="-g -I$xenprefix/include/" \\ > LDFLAGS="-g -L$xenprefix/lib/ -Wl,-rpath-link=$xenprefix/lib/" \\ > PKG_CONFIG_PATH="$xenprefix/lib/pkgconfig/" \\ > GNULIB_SRCDIR=$builddir/libvirt/$gnulib->{Path} \\ > -./autogen.sh --no-git \\ > +../autogen.sh --no-git \\ >--with-libxl --without-xen --without-xenapi > --without-selinux \\ >--without-lxc --without-vbox --without-uml \\ >--without-qemu --without-openvz --without-vmware \\ > @@ -72,9 +74,9 @@ END > > sub build() { > target_cmd_build($ho, 3600, $builddir, < -cd libvirt > -(make $makeflags 2>&1 && touch ../build-ok-stamp) |tee ../log > -test -f ../build-ok-stamp #/ > +cd libvirt/build > +(make $makeflags 2>&1 && touch ../../build-ok-stamp) |tee ../log > +test -f ../../build-ok-stamp #/ > echo ok. > END > } > @@ -82,7 +84,7 @@ END > sub install() { > target_cmd_build($ho, 300, $builddir, < mkdir -p dist > -cd libvirt > +cd libvirt/build > make $makeflags install DESTDIR=$builddir/dist > mkdir -p $builddir/dist/etc/init.d > END > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 2/2] ts-libvirt-build: Do an out-of-tree build
On 11/12/19 9:10 AM, Ian Jackson wrote: > Hi. Thanks for the information. > > Jim Fehlig writes ("Re: [OSSTEST PATCH 2/2] ts-libvirt-build: Do an > out-of-tree build"): >> I assumed libvirt's gradual move from autotools to meson would >> affect OSSTEST, but later rather than sooner. Sorry for not >> mentioning it earlier, but now you have been warned that libvirt is >> moving to meson :-). Meson has a strict separation between source >> and build directories and some preparatory patches were pushed that >> force srcdir != builddir >> >> https://www.redhat.com/archives/libvir-list/2019-October/msg01681.html > > I read this and some of it is a bit concerning. Does all of this >src: [stuff] generate source files into build directory > mean that previously only in-tree builds were supported and that > therefore there is no one set of build runes that will work both > before and after these changes ? VPATH builds were previously supported, as well as in-tree builds. But questions around this work are probably best answered by the author. Adding Pavel to cc. Pavel, for context, see Ian's OSSTEST patches to accommodate recent changes to libvirt's build system https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00514.html Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xl/libxl: add pvcalls support
On 03/29/2018 04:07 PM, Stefano Stabellini wrote: Add pvcalls support to libxl and xl. Create the appropriate pvcalls entries in xenstore. Signed-off-by: Stefano Stabellini --- Changes in v2: - rename pvcalls to pvcallsif internally in libxl to avoid `pvcallss' --- docs/misc/xenstore-paths.markdown| 9 + tools/libxl/Makefile | 2 +- tools/libxl/libxl.h | 10 ++ tools/libxl/libxl_create.c | 4 tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_pvcalls.c | 37 tools/libxl/libxl_types.idl | 7 +++ tools/libxl/libxl_types_internal.idl | 1 + tools/xl/xl_parse.c | 37 +++- 9 files changed, 106 insertions(+), 2 deletions(-) create mode 100644 tools/libxl/libxl_pvcalls.c diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown index 7be2592..77d1a36 100644 --- a/docs/misc/xenstore-paths.markdown +++ b/docs/misc/xenstore-paths.markdown @@ -299,6 +299,11 @@ A virtual scsi device frontend. Described by A virtual usb device frontend. Described by [xen/include/public/io/usbif.h][USBIF] + ~/device/pvcalls/$DEVID/* [] + +Paravirtualized POSIX function calls frontend. Described by +[docs/misc/pvcalls.markdown][PVCALLS] + ~/console/* [] The primary PV console device. Described in [console.txt](console.txt) @@ -377,6 +382,10 @@ A PV SCSI backend. A PV USB backend. Described by [xen/include/public/io/usbif.h][USBIF] + + ~/backend/pvcalls/$DOMID/$DEVID/* [] + +A PVCalls backend. Described in [docs/misc/pvcalls.markdown][PVCALLS]. ~/backend/console/$DOMID/$DEVID/* [] diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 917ceb0..035e66e 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -140,7 +140,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \ libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \ libxl_9pfs.o libxl_domain.o libxl_vdispl.o \ -$(LIBXL_OBJS-y) +libxl_pvcalls.o $(LIBXL_OBJS-y) LIBXL_OBJS += libxl_genid.o LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index eca0ea2..c4eccc5 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -2006,6 +2006,16 @@ int libxl_device_p9_destroy(libxl_ctx *ctx, uint32_t domid, const libxl_asyncop_how *ao_how) LIBXL_EXTERNAL_CALLERS_ONLY; +/* pvcalls interface */ +int libxl_device_pvcallsif_remove(libxl_ctx *ctx, uint32_t domid, + libxl_device_pvcallsif *pvcallsif, + const libxl_asyncop_how *ao_how) + LIBXL_EXTERNAL_CALLERS_ONLY; +int libxl_device_pvcallsif_destroy(libxl_ctx *ctx, uint32_t domid, + libxl_device_pvcallsif *pvcallsif, + const libxl_asyncop_how *ao_how) + LIBXL_EXTERNAL_CALLERS_ONLY; + /* PCI Passthrough */ int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid, libxl_device_pci *pcidev, diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index c498135..c43f391 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1374,6 +1374,10 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, for (i = 0; i < d_config->num_p9s; i++) libxl__device_add(gc, domid, &libxl__p9_devtype, &d_config->p9s[i]); +for (i = 0; i < d_config->num_pvcallsifs; i++) +libxl__device_add(gc, domid, &libxl__pvcallsif_devtype, + &d_config->pvcallsifs[i]); + switch (d_config->c_info.type) { case LIBXL_DOMAIN_TYPE_HVM: { diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 506687f..50209ff 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3648,6 +3648,7 @@ extern const struct libxl_device_type libxl__usbdev_devtype; extern const struct libxl_device_type libxl__pcidev_devtype; extern const struct libxl_device_type libxl__vdispl_devtype; extern const struct libxl_device_type libxl__p9_devtype; +extern const struct libxl_device_type libxl__pvcallsif_devtype; extern const struct libxl_device_type *device_type_tbl[]; diff --git a/tools/libxl/libxl_pvcalls.c b/tools/libxl/libxl_pvcalls.c new file mode 100644 index 000..bb6f307 --- /dev/null +++ b/tools/libxl/libxl_pvcalls.c @@ -0,0 +1,37 @@ +/* + * Copyright (C) 2018 Aporeto + * Author Stefano Stabellini + * + * This program is free software; you ca
Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL
On 06/13/2018 05:18 AM, Ian Jackson wrote: Jim: please read down to where I discuss test-amd64-amd64-libvirt-pair. If you have any insight I'd appreciate it. Let me know if you want me to preserve the logs, which will otherwise expire in a few weeks. Whoa, sorry for the delay. This mail found a dumb bug in my filter for xen-devel mail. test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass in 123701 From the log: 2018-06-12 20:59:40 Z executing ssh ... root@172.16.144.61 virsh migrate --live debian.guest.osstest xen+ssh://joubertin0 error: Timed out during operation: cannot acquire state change lock 2018-06-12 21:00:16 Z command nonzero waitstatus 256: [..] The libvirt libxl logs seem to show libxl doing a successful migration. With the long delay, I'm afraid the logs have expired. Do you still see the problem? All the recent runs seem to be plagued with libvirt's change to require GnuTLS https://libvirt.org/git/?p=libvirt.git;a=commit;h=60d9ad6f1e42618fce10baeb0f02c35e5ebd5b24 Looking at the logs I see this: 2018-06-12 21:00:16.784+: 3507: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24947) That job number looks like it's about right for a pid, but I think it must be a thread because it doesn't show up in the ps output. Likely a libvirtd worker thread doing something that requires modifying the state of virDomainObj. I did see this: Jun 12 21:00:20 joubertin0 logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. but that seems to be after the failure. A wild guess, but is it possible thread 24947 is running a domain create operation, which includes executing vif-bridge, that is taking longer than expected to complete? I don't have an explanation. I don't really know what this lock is. It's a lock that serializes domain state modifications (changing virDomainObj). Wait time for the lock is currently hardcoded to 30sec. The thread emitting the warning surpassed the timeout, waiting for 24947 to finish whatever it was doing. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] libvirtd hang on CentOS6 after latest updates
On 07/22/2018 04:03 PM, Karel Hendrych wrote: Hi, I am seeing frequent libvirtd hangs (clients not responding) after last CentOS6-Xen update : xen-devel is not the best place to seek help with downstream issues, particularly libvirt ones :-). You would have better luck contacting the CentOS6 maintainers. Regards, Jim libvirt-libs-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-network-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-nwfilter-4.1.0-2.xen46.el6.x86_64 libgcc-4.4.7-18.el6_9.2.x86_64 2:qemu-img-0.12.1.2-2.503.el6_9.5.x86_64 libvirt-daemon-driver-storage-core-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-secret-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-interface-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-nodedev-4.1.0-2.xen46.el6.x86_64 10:centos-release-xen-common-8-4.el6.x86_64 xen-licenses-4.6.6-12.el6.x86_64 xen-libs-4.6.6-12.el6.x86_64 libvirt-daemon-driver-libxl-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-xen-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-qemu-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-gluster-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-logical-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-mpath-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-disk-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-scsi-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-iscsi-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-storage-4.1.0-2.xen46.el6.x86_64 libstdc++-4.4.7-18.el6_9.2.x86_64 libvirt-daemon-config-nwfilter-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-config-network-4.1.0-2.xen46.el6.x86_64 libvirt-daemon-driver-lxc-4.1.0-2.xen46.el6.x86_64 libvirt-client-4.1.0-2.xen46.el6.x86_64 linux-firmware-20171215-82.git2451bb22.el6.noarch 12:dhcp-common-4.1.1-53.P1.el6.centos.4.x86_64 12:dhclient-4.1.1-53.P1.el6.centos.4.x86_64 libvirt-4.1.0-2.xen46.el6.x86_64 10:centos-release-xen-46-8-4.el6.x86_64 10:centos-release-xen-44-8-4.el6.x86_64 tzdata-2018e-3.el6.noarch libgomp-4.4.7-18.el6_9.2.x86_64 kernel-4.9.86-30.el6.x86_64 xen-hypervisor-4.6.6-12.el6.x86_64 xen-runtime-4.6.6-12.el6.x86_64 xen-4.6.6-12.el6.x86_64 libvirt-daemon-xen-4.1.0-2.xen46.el6.x86_64 Remedy is to kill -9 libvirtd and start again. Issue can be replicated within few domU starts. Usually libvirtd hangs when domU is bringing up xen drivers or something around udev, like: xen_netfront: Initialising Xen virtual ethernet driver I've been looking into libvirtd strace and debug logs, so far most suspicious in libvirtd debug log is this: libvirtd.log:2018-05-22 08:32:44.760+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-7' libvirtd.log:2018-05-22 08:32:44.761+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-6' libvirtd.log:2018-05-22 08:32:44.761+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-4' libvirtd.log:2018-05-22 08:32:44.762+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-5' libvirtd.log:2018-05-22 08:32:44.763+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-2' libvirtd.log:2018-05-22 08:32:44.764+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/tx-3' libvirtd.log:2018-05-22 08:32:44.765+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-6' libvirtd.log:2018-05-22 08:32:44.766+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-5' libvirtd.log:2018-05-22 08:32:44.767+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-4' libvirtd.log:2018-05-22 08:32:44.767+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-7' libvirtd.log:2018-05-22 08:32:44.768+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-2' libvirtd.log:2018-05-22 08:32:44.769+: 25455: debug : udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name '/sys/devices/vif-24-0/net/vif24.0/queues/rx-3' I could not get rid of this by reducing amount of driver queues (not sure if that applies to PV) Is someone out there seeing similar issues? Anyone perhaps interested in reviewing full deb
Re: [Xen-devel] libvirtd hang on CentOS6 after latest updates
On 07/25/2018 10:17 AM, George Dunlap wrote: On Wed, Jul 25, 2018 at 4:42 PM, Jim Fehlig wrote: On 07/22/2018 04:03 PM, Karel Hendrych wrote: Hi, I am seeing frequent libvirtd hangs (clients not responding) after last CentOS6-Xen update : xen-devel is not the best place to seek help with downstream issues, particularly libvirt ones :-). You would have better luck contacting the CentOS6 maintainers. In this case, it looks very much like they're suing the Virt SIG binaries, which are pretty close to being straight-up packing of the upstream tarballs, and the maintainers would be Anthony & I. And I at least know very little about libvirt. If Karel had posted this on centos-devel, I would almost certainly have ended up asking him to repost to xen-devel anyway, at which point I would have cc'd you. :-) Heh, ok :-). Does the error ring any bells? The udev messages are 'debug' level (not fatal) and unrelated IMO. It would be best to attach gdb to the libvirtd process and get a backtrace of all threads. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [OSSTEST] Install GnuTLS for libvirt builds
Since libvirt commit 60d9ad6f GnuTLS is required to build libvirt. The various libvirt build tests in osstest began failing after the commit hit libvirt.git master. Adding libgnutls28-dev to the list of packages needed to build libvirt will fix the currently broken builds. Signed-off-by: Jim Fehlig --- I cribbed the 'libgnutls28-dev' package name from the libvirt jenkins CI https://libvirt.org/git/?p=libvirt-jenkins-ci.git;a=blob;f=guests/vars/mappings.yml;h=be356aae616e7dacf603175fe1bea8ce398629e1;hb=HEAD#l138 Osstest/Toolstack/libvirt.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm index 45df173..d5cda77 100644 --- a/Osstest/Toolstack/libvirt.pm +++ b/Osstest/Toolstack/libvirt.pm @@ -26,7 +26,7 @@ use XML::LibXML; sub new { my ($class, $ho, $methname,$asset) = @_; -my @extra_packages = qw(libavahi-client3); +my @extra_packages = qw(libavahi-client3 libgnutls28-dev); my $nl_lib = "libnl-3-200"; $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/); push(@extra_packages, $nl_lib); -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Likely build race, "/usr/bin/ld: cannot find -lvirt"
On 05/24/2018 04:27 AM, Ian Jackson wrote: Ian Jackson writes ("Likely build race, "/usr/bin/ld: cannot find -lvirt""): tl;dr: I think there is a bug in libvirt's build system which, with low probability, causes a build failure containing this message: /usr/bin/ld: cannot find -lvirt Complete build logs of two attempts: http://logs.test-lab.xenproject.org/osstest/logs/123046/build-i386-libvirt/6.ts-libvirt-build.log http://logs.test-lab.xenproject.org/osstest/logs/123096/build-i386-libvirt/6.ts-libvirt-build.log I have run a number of attempts. Out of 5 more, 1 succeeded. So out of a total of 7 attempts, 1 succeeded. This repro rate is an IMO excellent opportunity to debug this race :-). There appears to be a missing dependency between the lockd library and libvirt library, but my autotools skills lack the savvy to find it. Here we see the install command and relinking of lockd.la /bin/bash ../libtool --mode=install /usr/bin/install -c lockd.la '/home/osstest/build.123096.build-i386-libvirt/dist/usr/local/lib/libvirt/lock-driver' libtool: install: warning: relinking `lockd.la' libtool: install: (cd /home/osstest/build.123096.build-i386-libvirt/libvirt/src; /bin/bash /home/osstest/build.123096.build-i386-libvirt/libvirt/libtool --silent --tag CC --mode=relink gcc -std=gnu99 -I./conf -I/usr/include/libxml2 -fno-common -W -Waddress -Waggressive-loop-optimizations -Wall -Wattributes -Wbad-function-cast -Wbuiltin-macro-redefined -Wcast-align -Wchar-subscripts -Wclobbered -Wcomment -Wcomments -Wcoverage-mismatch -Wcpp -Wdate-time -Wdeprecated-declarations -Wdiv-by-zero -Wdouble-promotion -Wempty-body -Wendif-labels -Wextra -Wformat-contains-nul -Wformat-extra-args -Wformat-security -Wformat-y2k -Wformat-zero-length -Wfree-nonheap-object -Wignored-qualifiers -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Winit-self -Winline -Wint-to-pointer-cast -Winvalid-memory-model -Winvalid-pch -Wjump-misses-init -Wlogical-op -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args -Wmissing-braces -Wmissing-declarations -Wmissing-field-initializers -Wmissing-include-dirs -Wmissing-parameter-type -Wmissing-prototypes -Wmultichar -Wnarrowing -Wnested-externs -Wnonnull -Wold-style-declaration -Wold-style-definition -Wopenmp-simd -Woverflow -Woverride-init -Wpacked-bitfield-compat -Wparentheses -Wpointer-arith -Wpointer-sign -Wpointer-to-int-cast -Wpragmas -Wpsabi -Wreturn-local-addr -Wreturn-type -Wsequence-point -Wshadow -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstrict-prototypes -Wsuggest-attribute=const -Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wsuggest-attribute=pure -Wswitch -Wsync-nand -Wtrampolines -Wtrigraphs -Wtype-limits -Wuninitialized -Wunknown-pragmas -Wunused -Wunused-but-set-parameter -Wunused-but-set-variable -Wunused-function -Wunused-label -Wunused-local-typedefs -Wunused-parameter -Wunused-result -Wunused-value -Wunused-variable -Wvarargs -Wvariadic-macros -Wvector-operation-performance -Wvolatile-register-var -Wwrite-strings -Wnormalized=nfc -Wno-sign-compare -Wjump-misses-init -Wswitch-enum -Wno-format-nonliteral -fstack-protector-strong -fexceptions -fasynchronous-unwind-tables -fipa-pure-const -Wno-suggest-attribute=pure -Wno-suggest-attribute=const -Werror -Wframe-larger-than=4096 -g -I/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/include/ -DLIBXL_API_VERSION=0x040400 -module -avoid-version -Wl,-z -Wl,nodelete -export-dynamic -Wl,-z -Wl,relro -Wl,-z -Wl,now -Wl,--no-copy-dt-needed-entries -Wl,-z -Wl,defs -g -L/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/lib/ -Wl,-rpath-link=/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/lib/ -o lockd.la -rpath /usr/local/lib/libvirt/lock-driver locking/lockd_la-lock_driver_lockd.lo locking/lockd_la-lock_protocol.lo libvirt.la ../gnulib/lib/libgnu.la -ldl -inst-prefix-dir /home/osstest/build.123096.build-i386-libvirt/dist) /usr/bin/ld: cannot find -lvirt collect2: error: ld returned 1 exit status libtool: install: error: relink `lockd.la' with the above command before installing it Makefile:6410: recipe for target 'install-lockdriverLTLIBRARIES' failed and several lines later it seems another thread finally finishes libvirt.la libtool: install: /usr/bin/install -c .libs/libvirt.lai /home/osstest/build.123096.build-i386-libvirt/dist/usr/local/lib/libvirt.la I've stared at the various Makefile.{,inc.}am files but can't spot the problem. Perhaps other libvirt maintainers with better autotools skills can give some hints. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL
On 08/21/2018 05:14 AM, Jan Beulich wrote: On 21.08.18 at 03:11, wrote: flight 126201 xen-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/126201/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328 Something needs to be done about this, as this continued failure is blocking the 4.9.3 release. I did mail about this on Aug 2nd already for flight 125710, I've got back from Wei: This is libvirtd's error message. The remote host can't obtain the state change log due to it is already held by another task/thread. It could be a libvirt / libxl bug. 2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975) I took a closer look at the logs and it appears the finish phase of migration fails to acquire the domain job lock since it is already held by the perform phase. In the perform phase, after the vm has been transferred to the dst, the qemu process associated with the vm is started. For whatever reason that takes a long time on this host: 2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: ... 2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=(null) ... 2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback: watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event epath=/local/domain/0/device-model/1/state 2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=running In the meantime we move to the finish phase and timeout waiting for the above perform phase to complete 2018-08-19 17:05:19.096+: 3492: debug : virThreadJobSet:96 : Thread 3492 (virNetServerHandleJob) is now running job remoteDispatchDomainMigrateFinish3Params ... 2018-08-19 17:05:49.253+: 3492: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24982) 2018-08-19 17:05:49.253+: 3492: error : libxlDomainObjBeginJob:155 : Timed out during operation: cannot acquire state change lock What could be causing the long startup time of qemu on these hosts? Does dom0 have enough cpu/memory? As you noticed, the libvirt commit used for this test has not changed in a long time, well before the failures appeared. Perhaps a subtle change in libxl is exposing the bug? Regardless, I'm happy to have looked at the issue since I think libvirt can be improved to cope with the problem. The thread running in the dst receiving the vm via libxl_domain_create_restore() can be created with joinable flag, then joined in the finish phase before attempting to acquire the job lock. I'll look into making such an improvement in libvirt's libxl driver. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [libvirt test] 126429: regressions - FAIL
On 08/24/2018 04:48 AM, Wei Liu wrote: On Fri, Aug 24, 2018 at 10:25:49AM +, osstest service owner wrote: flight 126429 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/126429/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-libvirt6 libvirt-buildfail REGR. vs. 123814 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 123814 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 123814 Missing build dependency in osstest. GnuTLS has become a hard requirement. I mentioned this a while ago, and even sent a patch :-) https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg02584.html Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/5] libxl: various migration V3 improvements
Patch 5 fixes a long standing problem found by some very slow hosts in xen's osstest https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg01945.html While working on the fix, I discovered other problems in libxl's V3 migration protocol. E.g. a modify job on the migrating VM was not handled properly across the phases on either src or dst host. Patches 1-4 fix this and other problems found along the way. Jim Fehlig (5): libxl: migration: defer removing VM until finish phase libxl: fix logic in P2P migration libxl: fix job handling across migration phases on src libxl: fix job handling across migration phases on dst libxl: join with thread receiving migration data src/libxl/libxl_domain.h| 1 + src/libxl/libxl_driver.c| 7 -- src/libxl/libxl_migration.c | 168 3 files changed, 114 insertions(+), 62 deletions(-) -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/5] libxl: fix job handling across migration phases on src
The libxlDomainMigrationSrc* functions are a bit flawed in their handling of modify jobs. A job begins at the start of the begin phase but ends before the phase completes. No job is running for the remaining phases of migration on the source host. Change the logic to keep the job running after a successful begin phase, and end the job in the confirm phase. The job must also end in the perform phase in the case of error since confirm phase would not be executed. Signed-off-by: Jim Fehlig --- src/libxl/libxl_migration.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index e4f2895690..191973edeb 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -399,6 +399,11 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn, virDomainDefPtr def; char *xml = NULL; +/* + * In the case of successful migration, a job is started here and + * terminated in the confirm phase. Errors in the begin or perform + * phase will also terminate the job. + */ if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) goto cleanup; @@ -428,6 +433,9 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn, goto endjob; xml = virDomainDefFormat(def, cfg->caps, VIR_DOMAIN_DEF_FORMAT_SECURE); +/* Valid xml means success! EndJob in the confirm phase */ +if (xml) +goto cleanup; endjob: libxlDomainObjEndJob(driver, vm); @@ -1169,6 +1177,14 @@ libxlDomainMigrationSrcPerformP2P(libxlDriverPrivatePtr driver, ret = libxlDoMigrateSrcP2P(driver, vm, sconn, xmlin, dconn, dconnuri, dname, uri_str, flags); +if (ret < 0) { +/* + * Confirm phase will not be executed if perform fails. End the + * job started in begin phase. + */ +libxlDomainObjEndJob(driver, vm); +} + cleanup: orig_err = virSaveLastError(); virObjectUnlock(vm); @@ -1232,11 +1248,17 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivatePtr driver, ret = libxlDoMigrateSrcSend(driver, vm, flags, sockfd); virObjectLock(vm); -if (ret < 0) +if (ret < 0) { virDomainLockProcessResume(driver->lockManager, "xen:///system", vm, priv->lockState); +/* + * Confirm phase will not be executed if perform fails. End the + * job started in begin phase. + */ +libxlDomainObjEndJob(driver, vm); +} cleanup: VIR_FORCE_CLOSE(sockfd); @@ -1386,6 +1408,8 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivatePtr driver, ret = 0; cleanup: +/* EndJob for corresponding BeginJob in begin phase */ +libxlDomainObjEndJob(driver, vm); virObjectEventStateQueue(driver->domainEventState, event); virObjectUnref(cfg); return ret; -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/5] libxl: fix logic in P2P migration
libxlDoMigrateSrcP2P() performs all phases of the migration protocol for peer-to-peer migration. Unfortunately the logic was a bit flawed since it is possible to skip the confirm phase after a successfull begin and prepare phase. Fix the logic to always call the confirm phase after a successful begin and perform. Skip the confirm phase if begin or perform fail. Signed-off-by: Jim Fehlig --- src/libxl/libxl_migration.c | 48 ++--- 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index 97f72d0390..e4f2895690 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -972,30 +972,35 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver, char *cookieout = NULL; int cookieoutlen; bool cancelled = true; +bool notify_source = true; virErrorPtr orig_err = NULL; int ret = -1; /* For tunnel migration */ virStreamPtr st = NULL; struct libxlTunnelControl *tc = NULL; +if (dname && +virTypedParamsAddString(¶ms, &nparams, &maxparams, +VIR_MIGRATE_PARAM_DEST_NAME, dname) < 0) +goto cleanup; + +if (uri && +virTypedParamsAddString(¶ms, &nparams, &maxparams, +VIR_MIGRATE_PARAM_URI, uri) < 0) +goto cleanup; + dom_xml = libxlDomainMigrationSrcBegin(sconn, vm, xmlin, &cookieout, &cookieoutlen); +/* + * If dom_xml is non-NULL the begin phase has succeeded, and the + * confirm phase must be called to cleanup the migration operation. + */ if (!dom_xml) goto cleanup; if (virTypedParamsAddString(¶ms, &nparams, &maxparams, VIR_MIGRATE_PARAM_DEST_XML, dom_xml) < 0) -goto cleanup; - -if (dname && -virTypedParamsAddString(¶ms, &nparams, &maxparams, -VIR_MIGRATE_PARAM_DEST_NAME, dname) < 0) -goto cleanup; - -if (uri && -virTypedParamsAddString(¶ms, &nparams, &maxparams, -VIR_MIGRATE_PARAM_URI, uri) < 0) -goto cleanup; +goto confirm; /* We don't require the destination to have P2P support * as it looks to be normal migration from the receiver perpective. @@ -1006,7 +1011,7 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver, virObjectUnlock(vm); if (flags & VIR_MIGRATE_TUNNELLED) { if (!(st = virStreamNew(dconn, 0))) -goto cleanup; +goto confirm; ret = dconn->driver->domainMigratePrepareTunnel3Params (dconn, st, params, nparams, cookieout, cookieoutlen, NULL, NULL, destflags); } else { @@ -1016,7 +1021,7 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver, virObjectLock(vm); if (ret == -1) -goto cleanup; +goto confirm; if (!(flags & VIR_MIGRATE_TUNNELLED)) { if (uri_out) { @@ -1038,8 +1043,10 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver, else ret = libxlDomainMigrationSrcPerform(driver, vm, NULL, NULL, uri_out, NULL, flags); -if (ret < 0) +if (ret < 0) { +notify_source = false; orig_err = virSaveLastError(); +} cancelled = (ret < 0); @@ -1067,12 +1074,15 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver, if (!orig_err) orig_err = virSaveLastError(); -VIR_DEBUG("Confirm3 cancelled=%d vm=%p", cancelled, vm); -ret = libxlDomainMigrationSrcConfirm(driver, vm, flags, cancelled); + confirm: +if (notify_source) { +VIR_DEBUG("Confirm3 cancelled=%d vm=%p", cancelled, vm); +ret = libxlDomainMigrationSrcConfirm(driver, vm, flags, cancelled); -if (ret < 0) -VIR_WARN("Guest %s probably left in 'paused' state on source", - vm->def->name); +if (ret < 0) +VIR_WARN("Guest %s probably left in 'paused' state on source", + vm->def->name); +} cleanup: if (flags & VIR_MIGRATE_TUNNELLED) { -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/5] libxl: join with thread receiving migration data
It is possible the incoming VM is not fully started when the finish phase of migration is executed. In libxlDomainMigrationDstFinish, wait for the thread receiving the VM to complete before executing finish phase tasks. Signed-off-by: Jim Fehlig --- src/libxl/libxl_domain.h| 1 + src/libxl/libxl_migration.c | 20 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/src/libxl/libxl_domain.h b/src/libxl/libxl_domain.h index 5d83230cd6..e193881450 100644 --- a/src/libxl/libxl_domain.h +++ b/src/libxl/libxl_domain.h @@ -65,6 +65,7 @@ struct _libxlDomainObjPrivate { /* console */ virChrdevsPtr devs; libxl_evgen_domain_death *deathW; +virThreadPtr migrationDstReceiveThr; unsigned short migrationPort; char *lockState; diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index 54b01a3169..fc7ccb53d0 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -297,9 +297,9 @@ libxlMigrateDstReceive(virNetSocketPtr sock, libxlMigrationDstArgs *args = opaque; virNetSocketPtr *socks = args->socks; size_t nsocks = args->nsocks; +libxlDomainObjPrivatePtr priv = args->vm->privateData; virNetSocketPtr client_sock; int recvfd = -1; -virThread thread; size_t i; /* Accept migration connection */ @@ -318,7 +318,10 @@ libxlMigrateDstReceive(virNetSocketPtr sock, * the migration data */ args->recvfd = recvfd; -if (virThreadCreate(&thread, false, +VIR_FREE(priv->migrationDstReceiveThr); +if (VIR_ALLOC(priv->migrationDstReceiveThr) < 0) +goto fail; +if (virThreadCreate(priv->migrationDstReceiveThr, true, libxlDoMigrateDstReceive, args) < 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Failed to create thread for receiving migration data")); @@ -557,7 +560,6 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn, libxlDriverPrivatePtr driver = dconn->privateData; virDomainObjPtr vm = NULL; libxlMigrationDstArgs *args = NULL; -virThread thread; bool taint_hook = false; libxlDomainObjPrivatePtr priv = NULL; char *xmlout = NULL; @@ -617,7 +619,10 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn, args->nsocks = 0; mig = NULL; -if (virThreadCreate(&thread, false, libxlDoMigrateDstReceive, args) < 0) { +VIR_FREE(priv->migrationDstReceiveThr); +if (VIR_ALLOC(priv->migrationDstReceiveThr) < 0) +goto error; +if (virThreadCreate(priv->migrationDstReceiveThr, true, libxlDoMigrateDstReceive, args) < 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Failed to create thread for receiving migration data")); goto endjob; @@ -1291,6 +1296,13 @@ libxlDomainMigrationDstFinish(virConnectPtr dconn, virObjectEventPtr event = NULL; virDomainPtr dom = NULL; +if (priv->migrationDstReceiveThr) { +virObjectUnlock(vm); +virThreadJoin(priv->migrationDstReceiveThr); +virObjectLock(vm); +VIR_FREE(priv->migrationDstReceiveThr); +} + virPortAllocatorRelease(priv->migrationPort); priv->migrationPort = 0; -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/5] libxl: migration: defer removing VM until finish phase
If for any reason the restore of a VM fails on the destination host in a migration operation, the VM is removed (if not persistent) from the virDomainObjList, meaning it is no longer available for additional cleanup or processing in the finish phase. Defer removing the VM from the virDomainObjList until the finish phase, which already contains logic to remove the VM. Signed-off-by: Jim Fehlig --- src/libxl/libxl_migration.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index b2e5847c58..97f72d0390 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -264,7 +264,6 @@ libxlDoMigrateDstReceive(void *opaque) libxlDriverPrivatePtr driver = args->conn->privateData; int recvfd = args->recvfd; size_t i; -int ret; virObjectRef(vm); virObjectLock(vm); @@ -274,12 +273,10 @@ libxlDoMigrateDstReceive(void *opaque) /* * Always start the domain paused. If needed, unpause in the * finish phase, after transfer of the domain is complete. + * Errors and cleanup are also handled in the finish phase. */ -ret = libxlDomainStartRestore(driver, vm, true, recvfd, - args->migcookie->xenMigStreamVer); - -if (ret < 0 && !vm->persistent) -virDomainObjListRemove(driver->domains, vm); +libxlDomainStartRestore(driver, vm, true, recvfd, +args->migcookie->xenMigStreamVer); /* Remove all listen socks from event handler, and close them. */ for (i = 0; i < nsocks; i++) { -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/5] libxl: fix job handling across migration phases on dst
The libxlDomainMigrationDst* functions are a bit flawed in their handling of modify jobs. A job begins when the destination host begins receiving the incoming VM and ends after the VM is started. The finish phase contains another BeginJob/EndJob sequence. This patch changes the logic to begin a job for the incoming VM in the prepare phase and end the job in the finish phase. Signed-off-by: Jim Fehlig --- src/libxl/libxl_driver.c| 7 src/libxl/libxl_migration.c | 65 +++-- 2 files changed, 40 insertions(+), 32 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 5a5e792957..73c2ff3546 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -6020,15 +6020,8 @@ libxlDomainMigrateFinish3Params(virConnectPtr dconn, return NULL; } -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) { -virDomainObjEndAPI(&vm); -return NULL; -} - ret = libxlDomainMigrationDstFinish(dconn, vm, flags, cancelled); -libxlDomainObjEndJob(driver, vm); - virDomainObjEndAPI(&vm); return ret; diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index 191973edeb..54b01a3169 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -266,9 +266,6 @@ libxlDoMigrateDstReceive(void *opaque) size_t i; virObjectRef(vm); -virObjectLock(vm); -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) -goto cleanup; /* * Always start the domain paused. If needed, unpause in the @@ -288,10 +285,6 @@ libxlDoMigrateDstReceive(void *opaque) args->nsocks = 0; VIR_FORCE_CLOSE(recvfd); virObjectUnref(args); - -libxlDomainObjEndJob(driver, vm); - - cleanup: virDomainObjEndAPI(&vm); } @@ -583,6 +576,13 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn, goto error; *def = NULL; +/* + * Unless an error is encountered in this function, the job will + * be terminated in the finish phase. + */ +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) +goto error; + priv = vm->privateData; if (taint_hook) { @@ -595,18 +595,18 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn, * stream -> pipe -> recvfd of libxlDomainStartRestore */ if (pipe(dataFD) < 0) -goto error; +goto endjob; /* Stream data will be written to pipeIn */ if (virFDStreamOpen(st, dataFD[1]) < 0) -goto error; +goto endjob; dataFD[1] = -1; /* 'st' owns the FD now & will close it */ if (libxlMigrationDstArgsInitialize() < 0) -goto error; +goto endjob; if (!(args = virObjectNew(libxlMigrationDstArgsClass))) -goto error; +goto endjob; args->conn = virObjectRef(dconn); args->vm = virObjectRef(vm); @@ -620,12 +620,15 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn, if (virThreadCreate(&thread, false, libxlDoMigrateDstReceive, args) < 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Failed to create thread for receiving migration data")); -goto error; +goto endjob; } ret = 0; goto done; + endjob: +libxlDomainObjEndJob(driver, vm); + error: libxlMigrationCookieFree(mig); VIR_FORCE_CLOSE(dataFD[1]); @@ -679,6 +682,13 @@ libxlDomainMigrationDstPrepare(virConnectPtr dconn, goto error; *def = NULL; +/* + * Unless an error is encountered in this function, the job will + * be terminated in the finish phase. + */ +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) +goto error; + priv = vm->privateData; if (taint_hook) { @@ -689,27 +699,27 @@ libxlDomainMigrationDstPrepare(virConnectPtr dconn, /* Create socket connection to receive migration data */ if (!uri_in) { if ((hostname = virGetHostname()) == NULL) -goto error; +goto endjob; if (STRPREFIX(hostname, "localhost")) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("hostname on destination resolved to localhost," " but migration requires an FQDN")); -goto error; +goto endjob; } if (virPortAllocatorAcquire(driver->migrationPorts, &port) < 0) -goto error; +goto endjob; priv->migrationPort = port; if (virAsprintf(uri_out, "tcp://%s:%d", hostname, port) < 0) -goto error; +goto endjob; } else { if (!(STRPREFIX(uri_in, "tcp://"))) { /* not full URI, add prefix tcp:// */
Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL
On 08/24/2018 02:58 AM, Wei Liu wrote: On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote: On 08/21/2018 05:14 AM, Jan Beulich wrote: On 21.08.18 at 03:11, wrote: flight 126201 xen-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/126201/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328 Something needs to be done about this, as this continued failure is blocking the 4.9.3 release. I did mail about this on Aug 2nd already for flight 125710, I've got back from Wei: This is libvirtd's error message. The remote host can't obtain the state change log due to it is already held by another task/thread. It could be a libvirt / libxl bug. 2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975) I took a closer look at the logs and it appears the finish phase of migration fails to acquire the domain job lock since it is already held by the perform phase. In the perform phase, after the vm has been transferred to the dst, the qemu process associated with the vm is started. For whatever reason that takes a long time on this host: 2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: ... 2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=(null) This is a spurious event after the watch has been set up. ... 2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback: watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event epath=/local/domain/0/device-model/1/state 2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=running So it has taken 32s for QEMU to write "running" in xenstore. This, however, is still within the timeout limit set by libxl (60s). Right, but it is not within libvirt's job wait timeout, which is 30s. I've sent a series to fix this and other problems I found while testing/debugging https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html Assuming those patches are committed to libvirt.git master, it's not clear how they will improve this and other tests that use an older, fixed libvirt commit. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH OSSTEST] Install GnuTLS for libvirt builds
Since libvirt commit 60d9ad6f GnuTLS is required to build libvirt. The various libvirt build tests in osstest began failing after the commit hit libvirt.git master. Adding libgnutls28-dev to the list of packages needed to build libvirt will fix the currently broken builds. Signed-off-by: Jim Fehlig --- Rebase and repost of https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg02584.html Osstest/Toolstack/libvirt.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm index 45df173..d5cda77 100644 --- a/Osstest/Toolstack/libvirt.pm +++ b/Osstest/Toolstack/libvirt.pm @@ -26,7 +26,7 @@ use XML::LibXML; sub new { my ($class, $ho, $methname,$asset) = @_; -my @extra_packages = qw(libavahi-client3); +my @extra_packages = qw(libavahi-client3 libgnutls28-dev); my $nl_lib = "libnl-3-200"; $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/); push(@extra_packages, $nl_lib); -- 2.18.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL
On 9/5/18 3:37 PM, Jim Fehlig wrote: On 08/24/2018 02:58 AM, Wei Liu wrote: On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote: On 08/21/2018 05:14 AM, Jan Beulich wrote: On 21.08.18 at 03:11, wrote: flight 126201 xen-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/126201/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328 Something needs to be done about this, as this continued failure is blocking the 4.9.3 release. I did mail about this on Aug 2nd already for flight 125710, I've got back from Wei: This is libvirtd's error message. The remote host can't obtain the state change log due to it is already held by another task/thread. It could be a libvirt / libxl bug. 2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975) I took a closer look at the logs and it appears the finish phase of migration fails to acquire the domain job lock since it is already held by the perform phase. In the perform phase, after the vm has been transferred to the dst, the qemu process associated with the vm is started. For whatever reason that takes a long time on this host: 2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: ... 2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=(null) This is a spurious event after the watch has been set up. ... 2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback: watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event epath=/local/domain/0/device-model/1/state 2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event: domain 1 device model: spawn watch p=running So it has taken 32s for QEMU to write "running" in xenstore. This, however, is still within the timeout limit set by libxl (60s). Right, but it is not within libvirt's job wait timeout, which is 30s. I've sent a series to fix this and other problems I found while testing/debugging https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html Assuming those patches are committed to libvirt.git master, it's not clear how they will improve this and other tests that use an older, fixed libvirt commit. FYI, the patches fixing this problem from the libvirt side have been committed to libvir.git master now. See commits 60b4fd90, e39c66d3, 47da84e0, 0149464a, and 5ea2abb3. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] virsh support?
On 9/14/18 8:08 AM, Dag Nygren wrote: Hi! Can someone inform me on XEN vtpm support in libvirt? From which version if so? FYI, questions regarding libvirt are better directed to libvirt-l...@redhat.com Asking because I tried to do a "virh dumpxml" on a XEN machine with vtpm attached and "xl list -l" lists it fine but there is nothing in the dumpxml result?? The libxl driver in libvirt does not support vtpm, and AFAIK no one is working on that. Patches welcome :-). Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v2] ts-xen-build-prep: install libgnutls28-dev for libvirt build
On 9/24/18 3:49 AM, Wei Liu wrote: d54ecf31b2 placed the build dependency in a wrong file. This patch adds the dependency to the right file. Add a runtime dependency in libvirt.pm. Thanks for fixing my fix :-). Regards, Jim Signed-off-by: Wei Liu --- Cc: Jim Fehlig --- Osstest/Toolstack/libvirt.pm | 2 +- ts-xen-build-prep| 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm index d5cda77e..13f92dae 100644 --- a/Osstest/Toolstack/libvirt.pm +++ b/Osstest/Toolstack/libvirt.pm @@ -26,7 +26,7 @@ use XML::LibXML; sub new { my ($class, $ho, $methname,$asset) = @_; -my @extra_packages = qw(libavahi-client3 libgnutls28-dev); +my @extra_packages = qw(libavahi-client3 libgnutls30); my $nl_lib = "libnl-3-200"; $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/); push(@extra_packages, $nl_lib); diff --git a/ts-xen-build-prep b/ts-xen-build-prep index 77a2d284..23bbbeb9 100755 --- a/ts-xen-build-prep +++ b/ts-xen-build-prep @@ -208,7 +208,8 @@ sub prep () { libxml2-utils libxml2-dev libdevmapper-dev w3c-dtd-xhtml libxml-xpath-perl libelf-dev - ccache nasm checkpolicy ebtables); + ccache nasm checkpolicy ebtables + libgnutls28-dev); if ($ho->{Suite} !~ m/squeeze|wheezy/) { push(@packages, qw(ocaml-nox ocaml-findlib)); ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] libxl: set channel devid when not provided by application
Applications like libvirt may not populate a device devid field, delegating that to libxl. If needed, the application can later retrieve the libxl-produced devid. Indeed most devices are handled this way in libvirt, channel devices included. This works well when only one channel device is defined, but more than one results in qemu-system-i386: -chardev socket,id=libxl-channel-1,\ path=/tmp/test-org.qemu.guest_agent.00,server,nowait: Duplicate ID 'libxl-channel-1' for chardev Besides the odd '-1' value in the id, multiple channels have the same id, causing qemu to fail. A simple fix is to set an uninitialized devid (-1) to the dev_num passed to libxl__init_console_from_channel(). Signed-off-by: Jim Fehlig --- I get the feeling that if needed devid should be set earlier, but this seems like the most opportune spot. Suggestions for improvements welcome. tools/libxl/libxl_console.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c index 39d8430df8..8faf3a24f3 100644 --- a/tools/libxl/libxl_console.c +++ b/tools/libxl/libxl_console.c @@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc, /* Perform validation first, allocate second. */ +if (channel->devid == -1) +channel->devid = dev_num; + if (!channel->name) { LOG(ERROR, "channel %d has no name", channel->devid); return ERROR_INVAL; -- 2.16.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k
On several Skylake machines I've observed xl segfaults when running create or destroy subcommands. Other subcommands may segfault too, but I've only looked at create and destroy which share a similar backtrace Thread 2 (Thread 0x77ff3700 (LWP 2941)): at /usr/include/bits/unistd.h:44 at xs.c:398 fd=) at xs.c:1231 Thread 1 has canceled Thread 2 and is waiting for it in pthread_join(). The backtrace smelled of memory/stack overflow, which was verified by increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack overflow is observed on Skylake due to a broader CPU feature set which must be saved within _dl_runtime_resolve and friends. While PTHREAD_STACK_MIN should advertise a suitable stack size based on the underlying system, increasing the default size makes xenstore a bit more robust on systems with insufficient/broken minimums. Signed-off-by: Jim Fehlig --- tools/xenstore/xs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c index abffd9cd80..3891e4907c 100644 --- a/tools/xenstore/xs.c +++ b/tools/xenstore/xs.c @@ -800,7 +800,7 @@ bool xs_watch(struct xs_handle *h, const char *path, const char *token) struct iovec iov[2]; #ifdef USE_PTHREAD -#define DEFAULT_THREAD_STACKSIZE (16 * 1024) +#define DEFAULT_THREAD_STACKSIZE (32 * 1024) #define READ_THREAD_STACKSIZE \ ((DEFAULT_THREAD_STACKSIZE < PTHREAD_STACK_MIN) ? \ PTHREAD_STACK_MIN : DEFAULT_THREAD_STACKSIZE) -- 2.16.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: set channel devid when not provided by application
Any comments on this patch? Thanks! Regards, Jim On 02/07/2018 08:04 PM, Jim Fehlig wrote: Applications like libvirt may not populate a device devid field, delegating that to libxl. If needed, the application can later retrieve the libxl-produced devid. Indeed most devices are handled this way in libvirt, channel devices included. This works well when only one channel device is defined, but more than one results in qemu-system-i386: -chardev socket,id=libxl-channel-1,\ path=/tmp/test-org.qemu.guest_agent.00,server,nowait: Duplicate ID 'libxl-channel-1' for chardev Besides the odd '-1' value in the id, multiple channels have the same id, causing qemu to fail. A simple fix is to set an uninitialized devid (-1) to the dev_num passed to libxl__init_console_from_channel(). Signed-off-by: Jim Fehlig --- I get the feeling that if needed devid should be set earlier, but this seems like the most opportune spot. Suggestions for improvements welcome. tools/libxl/libxl_console.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c index 39d8430df8..8faf3a24f3 100644 --- a/tools/libxl/libxl_console.c +++ b/tools/libxl/libxl_console.c @@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc, /* Perform validation first, allocate second. */ +if (channel->devid == -1) +channel->devid = dev_num; + if (!channel->name) { LOG(ERROR, "channel %d has no name", channel->devid); return ERROR_INVAL; ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k
On 02/21/2018 10:18 PM, Juergen Gross wrote: On 21/02/18 23:13, Jim Fehlig wrote: On several Skylake machines I've observed xl segfaults when running create or destroy subcommands. Other subcommands may segfault too, but I've only looked at create and destroy which share a similar backtrace Thread 2 (Thread 0x77ff3700 (LWP 2941)): at /usr/include/bits/unistd.h:44 at xs.c:398 fd=) at xs.c:1231 Thread 1 has canceled Thread 2 and is waiting for it in pthread_join(). The backtrace smelled of memory/stack overflow, which was verified by increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack overflow is observed on Skylake due to a broader CPU feature set which must be saved within _dl_runtime_resolve and friends. While PTHREAD_STACK_MIN should advertise a suitable stack size based on the underlying system, increasing the default size makes xenstore a bit more robust on systems with insufficient/broken minimums. We hit something like this before: https://lists.xen.org/archives/html/xen-devel/2016-07/msg01727.html The main problem is that any thread local storage is taken from the stack without any interface being available for adjusting the _real_ stack size instead of the meory for thread local storage + stack. So we can increase the stack size of the xenstore thread and wait for the next breakage, or we have to think about a proper solution. Right now I have no sensible idea how to address the problem, as the old thread suggests the underlying glibc problem isn't fixed yet (wow: the problem is known for more than 7 years now): https://sourceware.org/bugzilla/show_bug.cgi?id=11787 It looks like the bug I'm hitting is described in https://sourceware.org/bugzilla/show_bug.cgi?id=22636 And unlike the other bug, it has been fixed. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/xenstore: try to get minimum thread stack size for watch thread
On 02/22/2018 06:53 AM, Juergen Gross wrote: When creating a pthread in xs_watch() try to get the minimal needed size of the thread from glibc instead of using a constant. This avoids problems when the library is used in programs with large per-thread memory. Use dlsym() to get the pointer to __pthread_get_minstack() in order to avoid linkage problems and fall back to the current constant size if not found. Signed-off-by: Juergen Gross --- Only compile tested. Jim, can you please verify this patch is solving your original problem? It didn't help, but it could be due to my buggy glibc # gdb xl ... (gdb) r create test-hvm.xl Starting program: /usr/sbin/xl create test-hvm.xl Parsing config from test-hvm.xl Program received signal SIGSEGV, Segmentation fault. 0x772d51c2 in __pthread_get_minstack () from /lib64/libpthread.so.0 (gdb) thr a a bt Thread 1 (Thread 0x77fd8780 (LWP 2568)): #0 0x772d51c2 in __pthread_get_minstack () from /lib64/libpthread.so.0 #1 0x766ae259 in xs_watch (h=0x5578fc90, path=path@entry=0x55798fa0 "/local/domain/0/device-model/2/state", token=token@entry=0x557990b0 "3/0") at xs.c:826 #2 0x779476f4 in libxl__ev_xswatch_register (gc=gc@entry=0x557955f0, w=w@entry=0x55797468, func=func@entry=0x7793dd10 , path=0x55798fa0 "/local/domain/0/device-model/2/state") at libxl_event.c:638 #3 0x7793deb0 in libxl__xswait_start (gc=gc@entry=0x557955f0, xswa=xswa@entry=0x557973e0) at libxl_aoutils.c:53 #4 0x779326b0 in libxl__spawn_spawn (egc=egc@entry=0x7fffd950, ss=ss@entry=0x55797370) at libxl_exec.c:292 #5 0x779258d3 in libxl__spawn_local_dm (egc=0x7fffd950, dmss=) at libxl_dm.c:2400 #6 0x7791d3a7 in domcreate_launch_dm (egc=0x7fffd950, multidev=0x55798168, ret=) at libxl_create.c:1379 #7 0x77967275 in libxl__bootloader_run (egc=egc@entry=0x7fffd950, bl=bl@entry=0x55796cc0) at libxl_bootloader.c:403 #8 0x7791ffe3 in initiate_domain_create (egc=egc@entry=0x7fffd950, dcs=dcs@entry=0x55796610) at libxl_create.c:997 #9 0x779201a1 in do_domain_create (ctx=ctx@entry=0x5578f2a0, d_config=d_config@entry=0x7fffdb70, domid=domid@entry=0x7fffdaa8, restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, params=params@entry=0x0, ao_how=0x0, aop_console_how=0x0) at libxl_create.c:1682 #10 0x779204b6 in libxl_domain_create_new (ctx=0x5578f2a0, d_config=d_config@entry=0x7fffdb70, domid=domid@entry=0x7fffdaa8, ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x0) at libxl_create.c:1885 #11 0x555780b4 in create_domain (dom_info=dom_info@entry=0x7fffe0b0) at xl_vmcontrol.c:902 #12 0x555790c4 in main_create (argc=1, argv=0x7fffe378) at xl_vmcontrol.c:1207 #13 0x55560c5b in main (argc=2, argv=0x7fffe370) at xl.c:384 If you like, I can try a patched glibc after the weekend :-). Regards, Jim --- tools/xenstore/Makefile | 4 tools/xenstore/xs.c | 19 ++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile index 2b99d2bc1b..fb6c73e297 100644 --- a/tools/xenstore/Makefile +++ b/tools/xenstore/Makefile @@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR) ln -sf $< $@ xs.opic: CFLAGS += -DUSE_PTHREAD +ifeq ($(CONFIG_Linux),y) +xs.opic: CFLAGS += -DUSE_DLSYM +xs.opic: LDFLAGS += -ldl +endif libxenstore.so.$(MAJOR).$(MINOR): xs.opic xs_lib.opic $(CC) $(LDFLAGS) $(PTHREAD_LDFLAGS) -Wl,$(SONAME_LDFLAG) -Wl,libxenstore.so.$(MAJOR) $(SHLIB_LDFLAGS) -o $@ $^ $(LDLIBS_libxentoolcore) $(SOCKET_LIBS) $(PTHREAD_LIBS) $(APPEND_LDFLAGS) diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c index abffd9cd80..8372f5b1a4 100644 --- a/tools/xenstore/xs.c +++ b/tools/xenstore/xs.c @@ -47,6 +47,11 @@ struct xs_stored_msg { #include +#ifdef USE_DLSYM +#define __USE_GNU +#include +#endif + struct xs_handle { /* Communications channel to xenstore daemon. */ int fd; @@ -810,12 +815,24 @@ bool xs_watch(struct xs_handle *h, const char *path, const char *token) if (!h->read_thr_exists) { sigset_t set, old_set; pthread_attr_t attr; + static size_t stack_size; +#ifdef USE_DLSYM + size_t (*getsz)(void); +#endif + if (!stack_size) { +#ifdef USE_DLSYM + getsz = dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); + stack_size = getsz ? getsz() : READ_THREAD_STACKSIZE; +#else + stack_size = READ_THREAD_STACKSIZE; +#endif + } if (pthread_attr_init(&attr) != 0) { mutex_unlock(&h->request_mutex); return false; }
Re: [Xen-devel] [PATCH v2] tools/xenstore: try to get minimum thread stack size for watch thread
On 02/26/2018 01:46 AM, Juergen Gross wrote: When creating a pthread in xs_watch() try to get the minimal needed size of the thread from glibc instead of using a constant. This avoids problems when the library is used in programs with large per-thread memory. Use dlsym() to get the pointer to __pthread_get_minstack() in order to avoid linkage problems and fall back to the current constant size if not found. Signed-off-by: Juergen Gross --- V2: - use _GNU_SOURCE (Wei Liu) - call __pthread_get_minstack() with parameter - add -ldl to correct make flags - ensure to not using smaller stack size than today --- tools/xenstore/Makefile | 4 tools/xenstore/xs.c | 21 - 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile index 2b99d2bc1b..0831be0b6f 100644 --- a/tools/xenstore/Makefile +++ b/tools/xenstore/Makefile @@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR) ln -sf $< $@ xs.opic: CFLAGS += -DUSE_PTHREAD +ifeq ($(CONFIG_Linux),y) +xs.opic: CFLAGS += -DUSE_DLSYM +libxenstore.so.$(MAJOR).$(MINOR): LDFLAGS += -ldl +endif Dropping this patch in one of my automated builds caused a libxenstore link failure [ 99s] gcc-lsystemd -ldl -pthread -Wl,-soname -Wl,libxenstore.so.3.0 -shared -o libxenstore.so.3.0.3 xs.opic xs_lib.opic /home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/libs/toolcore/libxentoolcore.so [ 99s] /home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/xenstore/libxenstore.so: undefined reference to `dlsym' I hacked around it by appending '-ldl' to the end of the subsequent libxenstore.so rule. libxenstore.so.$(MAJOR).$(MINOR): xs.opic xs_lib.opic $(CC) $(LDFLAGS) $(PTHREAD_LDFLAGS) -Wl,$(SONAME_LDFLAG) -Wl,libxenstore.so.$(MAJOR) $(SHLIB_LDFLAGS) -o $@ $^ $(LDLIBS_libxentoolcore) $(SOCKET_LIBS) $(PTHREAD_LIBS) $(APPEND_LDFLAGS) diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c index abffd9cd80..77700bff2b 100644 --- a/tools/xenstore/xs.c +++ b/tools/xenstore/xs.c @@ -16,6 +16,8 @@ License along with this library; If not, see <http://www.gnu.org/licenses/>. */ +#define _GNU_SOURCE + #include #include #include @@ -47,6 +49,10 @@ struct xs_stored_msg { #include +#ifdef USE_DLSYM +#include +#endif + struct xs_handle { /* Communications channel to xenstore daemon. */ int fd; @@ -810,12 +816,25 @@ bool xs_watch(struct xs_handle *h, const char *path, const char *token) if (!h->read_thr_exists) { sigset_t set, old_set; pthread_attr_t attr; + static size_t stack_size; +#ifdef USE_DLSYM + size_t (*getsz)(pthread_attr_t *attr); +#endif if (pthread_attr_init(&attr) != 0) { mutex_unlock(&h->request_mutex); return false; } - if (pthread_attr_setstacksize(&attr, READ_THREAD_STACKSIZE) != 0) { + if (!stack_size) { +#ifdef USE_DLSYM + getsz = dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); + if (getsz) + stack_size = getsz(&attr); +#endif + if (stack_size < READ_THREAD_STACKSIZE) + stack_size = READ_THREAD_STACKSIZE; + } + if (pthread_attr_setstacksize(&attr, stack_size) != 0) { pthread_attr_destroy(&attr); mutex_unlock(&h->request_mutex); return false; This worked fine, even on the system with the buggy glibc. Tested-by: Jim Fehlig Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH V2] libxl: set channel devid when not provided by application
Applications like libvirt may not populate a device devid field, delegating that to libxl. If needed, the application can later retrieve the libxl-produced devid. Indeed most devices are handled this way in libvirt, channel devices included. This works well when only one channel device is defined, but more than one results in qemu-system-i386: -chardev socket,id=libxl-channel-1,\ path=/tmp/test-org.qemu.guest_agent.00,server,nowait: Duplicate ID 'libxl-channel-1' for chardev Besides the odd '-1' value in the id, multiple channels have the same id, causing qemu to fail. A simple fix is to set an uninitialized devid (-1) to the dev_num passed to libxl__init_console_from_channel(). Signed-off-by: Jim Fehlig --- V2: Set console devid to channel devid as part of initializing a console from a channel. tools/libxl/libxl_console.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c index 39d8430df8..9a02a23c2a 100644 --- a/tools/libxl/libxl_console.c +++ b/tools/libxl/libxl_console.c @@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc, /* Perform validation first, allocate second. */ +if (channel->devid == -1) +channel->devid = dev_num; + if (!channel->name) { LOG(ERROR, "channel %d has no name", channel->devid); return ERROR_INVAL; @@ -446,7 +449,7 @@ int libxl__init_console_from_channel(libxl__gc *gc, abort(); } -console->devid = dev_num; +console->devid = channel->devid; console->consback = LIBXL__CONSOLE_BACKEND_IOEMU; console->backend_domid = channel->backend_domid; console->name = libxl__strdup(NOGC, channel->name); -- 2.16.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] tools/xenstore: try to get minimum thread stack size for watch thread
On 03/02/2018 05:40 AM, Wei Liu wrote: On Fri, Mar 02, 2018 at 12:29:31PM +, Wei Liu wrote: On Mon, Feb 26, 2018 at 09:53:38AM -0700, Jim Fehlig wrote: On 02/26/2018 01:46 AM, Juergen Gross wrote: When creating a pthread in xs_watch() try to get the minimal needed size of the thread from glibc instead of using a constant. This avoids problems when the library is used in programs with large per-thread memory. Use dlsym() to get the pointer to __pthread_get_minstack() in order to avoid linkage problems and fall back to the current constant size if not found. Signed-off-by: Juergen Gross --- V2: - use _GNU_SOURCE (Wei Liu) - call __pthread_get_minstack() with parameter - add -ldl to correct make flags - ensure to not using smaller stack size than today --- tools/xenstore/Makefile | 4 tools/xenstore/xs.c | 21 - 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile index 2b99d2bc1b..0831be0b6f 100644 --- a/tools/xenstore/Makefile +++ b/tools/xenstore/Makefile @@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR) ln -sf $< $@ xs.opic: CFLAGS += -DUSE_PTHREAD +ifeq ($(CONFIG_Linux),y) +xs.opic: CFLAGS += -DUSE_DLSYM +libxenstore.so.$(MAJOR).$(MINOR): LDFLAGS += -ldl +endif Dropping this patch in one of my automated builds caused a libxenstore link failure [ 99s] gcc-lsystemd -ldl -pthread -Wl,-soname -Wl,libxenstore.so.3.0 -shared -o libxenstore.so.3.0.3 xs.opic xs_lib.opic /home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/libs/toolcore/libxentoolcore.so [ 99s] /home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/xenstore/libxenstore.so: undefined reference to `dlsym' I hacked around it by appending '-ldl' to the end of the subsequent libxenstore.so rule. Hmm... Maybe I'm a bit dense today. I know the position of -l matters but I don't quite understand how placing -pthread before xs.opic works but -ldl doesn't. xs.c uses both after all. I'm indeed very dense -- -pthread is a special option that sets the proper flags for linking pthread library for both the preprocessor and linker. But still, Juergen must have tested the change, so I wonder why it doesn't work in your setup. What is your build environment? Gcc version? I dropped the patch in a package build on the openSUSE build service, where gcc7 was used. But I don't see the problem when building from sources with gcc7. Apparently we have a bug in our package build, so ignore this comment. Tested-by still stands though :-). Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [libvirt test] 118006: regressions - FAIL
On 01/15/2018 07:49 AM, osstest service owner wrote: flight 118006 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/118006/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 117772 build-i386-libvirt6 libvirt-buildfail REGR. vs. 117772 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 117772 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 117772 Should be fixed by https://libvirt.org/git/?p=libvirt.git;a=commit;h=66aa7e02c69cd90995f29dbfaca6c659ffe11693 Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On 3/25/20 1:11 AM, Jan Beulich wrote: On 24.03.2020 19:39, Julien Grall wrote: On 24/03/2020 16:13, Jan Beulich wrote: On 24.03.2020 16:21, Hongyan Xia wrote: From: Hongyan Xia In contrast, after dropping that commit, parallel domain destructions will just fail to take the domctl lock, creating a hypercall continuation and backing off immediately, allowing the thread that holds the lock to destroy a domain much more quickly and allowing backed-off threads to process events and irqs. On a 144-core server with 4TiB of memory, destroying 32 guests (each with 4 vcpus and 122GiB memory) simultaneously takes: before the revert: 29 minutes after the revert: 6 minutes This wants comparing against numbers demonstrating the bad effects of the global domctl lock. Iirc they were quite a bit higher than 6 min, perhaps depending on guest properties. Your original commit message doesn't contain any clue in which cases the domctl lock was an issue. So please provide information on the setups you think it will make it worse. I did never observe the issue myself - let's see whether one of the SUSE people possibly involved in this back then recall (or have further pointers; Jim, Charles?), or whether any of the (partly former) Citrix folks do. My vague recollection is that the issue was the tool stack as a whole stalling for far too long in particular when destroying very large guests. I too only have a vague memory of the issue but do recall shutting down large guests (e.g. 500GB) taking a long time and blocking other toolstack operations. I haven't checked on the behavior in quite some time though. One important aspect not discussed in the commit message at all is that holding the domctl lock block basically _all_ tool stack operations (including e.g. creation of new guests), whereas the new issue attempted to be addressed is limited to just domain cleanup. I more vaguely recall shutting down the host taking a *long* time when dom0 had large amounts of memory, e.g. when it had all host memory (no dom0_mem= setting and autoballooning enabled). Regards, Jim
Re: [libvirt test] 149773: regressions - FAIL
On 4/24/20 3:53 AM, osstest service owner wrote: flight 149773 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/149773/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 146182 Probably best to disable these tests to avoid all the spam. Regards, Jim
[Xen-devel] [OSSTEST PATCH] build: fix configuration of libvirt
libvirt.git commit 2621d48f00 removed the last traces of gnulib, which also removed the '--no-git' option from autogen.sh. Unknown options are now passed to the configure script, which quickly fails with configure: error: unrecognized option: `--no-git' Remove the gnulib handling from ts-libvirt-build, including the '--no-git' option to autogen.sh. While at it remove configure options no longer supported by the libvirt configure script. Signed-off-by: Jim Fehlig --- I have poor perl skills, but hopefully this fixes the latest build failures of the libvirt test project, e.g. http://logs.test-lab.xenproject.org/osstest/logs/146921/build-amd64-libvirt/6.ts-libvirt-build.log ts-libvirt-build | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/ts-libvirt-build b/ts-libvirt-build index e799f003..ac5afcf2 100755 --- a/ts-libvirt-build +++ b/ts-libvirt-build @@ -26,8 +26,7 @@ tsreadconfig(); selectbuildhost(\@ARGV); builddirsprops(); -our %submodmap = qw(gnulib gnulib -keycodemapdb keycodemapdb); +our %submodmap = qw(keycodemapdb keycodemapdb); our $submodules; sub libvirtd_init (); @@ -50,12 +49,6 @@ sub config() { } die "no xen prefix" unless $xenprefix; -# Uses --no-git because otherwise autogen.sh will undo -# submodulefixup's attempts to honour -# revision_libvirt_gnulib. This in turn requires that we specify -# --gnulib-srcdir, but ./autogen.sh doesn't propagate -# --gnulib-srcdir to ./bootstap so we use GNULIB_SRCDIR directly. -my $gnulib = submodule_find($submodules, "gnulib"); target_cmd_build($ho, 3600, $builddir, <{Path} \\ -../autogen.sh --no-git \\ - --with-libxl --without-xen --without-xenapi --without-selinux \\ - --without-lxc --without-vbox --without-uml \\ +../autogen.sh \\ + --with-libxl --without-selinux \\ + --without-lxc --without-vbox \\ --without-qemu --without-openvz --without-vmware \\ --sysconfdir=/etc --localstatedir=/var #/ END -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [OSSTEST PATCH V2] build: fix configuration of libvirt
libvirt.git commit 2621d48f00 removed the last traces of gnulib, which also removed the '--no-git' option from autogen.sh. Unknown options are now passed to the configure script, which quickly fails with configure: error: unrecognized option: `--no-git' Remove the gnulib handling from ts-libvirt-build, including the '--no-git' option to autogen.sh. While at it remove configure options no longer supported by the libvirt configure script. Signed-off-by: Jim Fehlig --- The only change from V1 is adding Ian to cc. I have poor perl skills, but hopefully this fixes the latest build failures of the libvirt test project, e.g. http://logs.test-lab.xenproject.org/osstest/logs/146921/build-amd64-libvirt/6.ts-libvirt-build.log ts-libvirt-build | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/ts-libvirt-build b/ts-libvirt-build index e799f003..ac5afcf2 100755 --- a/ts-libvirt-build +++ b/ts-libvirt-build @@ -26,8 +26,7 @@ tsreadconfig(); selectbuildhost(\@ARGV); builddirsprops(); -our %submodmap = qw(gnulib gnulib -keycodemapdb keycodemapdb); +our %submodmap = qw(keycodemapdb keycodemapdb); our $submodules; sub libvirtd_init (); @@ -50,12 +49,6 @@ sub config() { } die "no xen prefix" unless $xenprefix; -# Uses --no-git because otherwise autogen.sh will undo -# submodulefixup's attempts to honour -# revision_libvirt_gnulib. This in turn requires that we specify -# --gnulib-srcdir, but ./autogen.sh doesn't propagate -# --gnulib-srcdir to ./bootstap so we use GNULIB_SRCDIR directly. -my $gnulib = submodule_find($submodules, "gnulib"); target_cmd_build($ho, 3600, $builddir, <{Path} \\ -../autogen.sh --no-git \\ - --with-libxl --without-xen --without-xenapi --without-selinux \\ - --without-lxc --without-vbox --without-uml \\ +../autogen.sh \\ + --with-libxl --without-selinux \\ + --without-lxc --without-vbox \\ --without-qemu --without-openvz --without-vmware \\ --sysconfdir=/etc --localstatedir=/var #/ END -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [OSSTEST PATCH V2] build: fix configuration of libvirt
On 2/14/20 10:47 AM, Ian Jackson wrote: Jim Fehlig writes ("[OSSTEST PATCH V2] build: fix configuration of libvirt"): libvirt.git commit 2621d48f00 removed the last traces of gnulib, which also removed the '--no-git' option from autogen.sh. Unknown options are now passed to the configure script, which quickly fails with configure: error: unrecognized option: `--no-git' Remove the gnulib handling from ts-libvirt-build, including the '--no-git' option to autogen.sh. While at it remove configure options no longer supported by the libvirt configure script. Harmf. Thanks for looking into this and trying to fix this mess. I think there is a problem with your patch, which is that 2621d48f00 is recent enough that we might want still to be able to build with earlier versions. Ah, good point. Is there an easy way to tell (by looking at the tree after checkout, maybe) whether to do the old or the new thing ? There would be no gnulib directory in a tree checked out after commit 2621d48f00. Another option is to check for the 'bootstrap' script in the root of the tree, which was removed by 2621d48f00. Your perl code looks good to me for what it is trying to do. I'm afraid my perl is too weak to quickly hack something up to support both pre and post gnulib builds :-(. I'll add this task to my list if you don't have time for it. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] libvirt support for scheduler credit2
On 1/21/20 10:05 AM, Jürgen Groß wrote: > On 21.01.20 17:56, Kevin Stange wrote: >> Hi, >> >> I looked around a bit and wasn't able to find a good answer to this, so >> George suggested I ask here. > > Cc-ing Jim. > >> >> Since Xen 4.12, credit2 is the default scheduler, but at least as of >> libvirt 5.1.0 virsh doesn't appear to understand credit2 and produces >> this sort of output: You would see the same with libvirt.git master, sorry. ATM the libvirt libxl driver is unaware of the credit2 scheduler. Hmm, as I recall Dario was going to provide a patch for libvirt :-). But he is quite busy so it will have to be added to my very long todo list. Regards, Jim >> >> # xl sched-credit2 -d yw6hk7mo6zy3k8 >> Name ID Weight Cap >> yw6hk7mo6zy3k8 4 10 0 >> # virsh schedinfo yw6hk7mo6zy3k8 >> Scheduler : credit2 >> >> Compared to a host running credit: >> >> # xl sched-credit -d gvz2b16sq38dv9 >> Name ID Weight Cap >> gvz2b16sq38dv9 14 800 0 >> # virsh schedinfo gvz2b16sq38dv9 >> Scheduler : credit >> weight : 800 >> cap : 0 >> >> Trying to change the weight does nothing, not even producing an error >> message: >> >> # virsh schedinfo syuxplsmdihcwc --weight 300 >> Scheduler : credit2 >> >> # xl sched-credit2 -d syuxplsmdihcwc >> Name ID Weight Cap >> syuxplsmdihcwc 23 400 0 >> >> Is there a version of libvirt where I can expect this to work, or is it >> not supported yet? As a workaround for now I've added sched=credit to >> my command line, but it would be nice to gain the benefits of improved >> scheduling at some point. >> > > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] libvirt support for scheduler credit2
On 1/29/20 4:10 AM, Dario Faggioli wrote: > On Wed, 2020-01-22 at 18:56 +0000, Jim Fehlig wrote: >> On 1/21/20 10:05 AM, Jürgen Groß wrote: >>> On 21.01.20 17:56, Kevin Stange wrote: >>>> >>>> Since Xen 4.12, credit2 is the default scheduler, but at least as >>>> of >>>> libvirt 5.1.0 virsh doesn't appear to understand credit2 and >>>> produces >>>> this sort of output: >> >> You would see the same with libvirt.git master, sorry. ATM the >> libvirt libxl >> driver is unaware of the credit2 scheduler. >> > Right. I Just sent the patch: > https://www.redhat.com/archives/libvir-list/2020-January/msg01292.html Thanks! I tweaked it a bit and committed to libvirt.git https://libvirt.org/git/?p=libvirt.git;a=commit;h=849052ec61e18780713bec171748e859e32dfd6d Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [PATCH 00/14] deprecations: remove many old deprecations
Adding xen-devel and Ian to cc. On 2/24/21 6:11 AM, Daniel P. Berrangé wrote: The following features have been deprecated for well over the 2 release cycle we promise This reminded me of a bug report we received late last year when updating to 5.2.0. 'virsh setvcpus' suddenly stopped working for Xen HVM guests. Turns out libxl uses cpu-add under the covers. ``-usbdevice`` (since 2.10.0) ``-drive file=3Djson:{...{'driver':'file'}}`` (since 3.0) ``-vnc acl`` (since 4.0.0) ``-mon ...,control=3Dreadline,pretty=3Don|off`` (since 4.1) ``migrate_set_downtime`` and ``migrate_set_speed`` (since 2.8.0) ``query-named-block-nodes`` result ``encryption_key_missing`` (since 2.10.0) ``query-block`` result ``inserted.encryption_key_missing`` (since 2.10.0) ``migrate-set-cache-size`` and ``query-migrate-cache-size`` (since 2.11.0) ``query-named-block-nodes`` and ``query-block`` result dirty-bitmaps[i].sta= tus (ince 4.0) ``query-cpus`` (since 2.12.0) ``query-cpus-fast`` ``arch`` output member (since 3.0.0) ``query-events`` (since 4.0) chardev client socket with ``wait`` option (since 4.0) ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (s= ince 4.0.0) ``ide-drive`` (since 4.2) ``scsi-disk`` (since 4.2) AFAICT, libvirt has ceased to use all of these too. A quick grep of the libxl code shows it uses -usbdevice, query-cpus, and scsi-disk. There are many more similarly old deprecations not (yet) tackled. The Xen tools maintainers will need to be more vigilant of the deprecations. I don't follow Xen development close enough to know if this topic has already been discussed. Regards, Jim
Re: [libvirt test] 151910: regressions - FAIL
On 7/15/20 9:07 AM, osstest service owner wrote: flight 151910 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/151910/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 I see the same configure failure has been encountered since July 11 checking for XDR... no configure: error: You must install the libtirpc >= 0.1.10 pkg-config module to compile libvirt AFAICT there have been no related changes in libvirt (which has required libtirpc for over two years). Has this package changed in debian, or no longer part of a base build config? Regards, Jim
[PATCH] OSSTEST: Install libtirpc-dev for libvirt builds
The check for XDR support was changed in libvirt commit d7147b3797 to use libtirpc pkg-config instead of complicated AC_CHECK_LIB, AC_COMPILE_IFELSE, et. al. logic. The libvirt OSSTEST has been failing since this change hit libvirt.git master. Fix it by adding libtirpc-dev to the list of 'extra_packages' installed for libvirt builds. Signed-off-by: Jim Fehlig --- I *think* this change will work for older libvirt branches too. The old, hand-coded m4 logic should work with libtirpc-dev installed. Osstest/Toolstack/libvirt.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm index e817f5b4..11e4d730 100644 --- a/Osstest/Toolstack/libvirt.pm +++ b/Osstest/Toolstack/libvirt.pm @@ -26,7 +26,7 @@ use XML::LibXML; sub new { my ($class, $ho, $methname,$asset) = @_; -my @extra_packages = qw(libavahi-client3); +my @extra_packages = qw(libavahi-client3 libtirpc-dev); my $nl_lib = "libnl-3-200"; my $libgnutls = "libgnutls30"; -- 2.26.2
Re: [libvirt test] 151910: regressions - FAIL
On 7/15/20 1:53 PM, Jim Fehlig wrote: On 7/15/20 9:07 AM, osstest service owner wrote: flight 151910 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/151910/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-build fail REGR. vs. 151777 build-i386-libvirt 6 libvirt-build fail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-build fail REGR. vs. 151777 build-armhf-libvirt 6 libvirt-build fail REGR. vs. 151777 I see the same configure failure has been encountered since July 11 checking for XDR... no configure: error: You must install the libtirpc >= 0.1.10 pkg-config module to compile libvirt AFAICT there have been no related changes in libvirt (which has required libtirpc for over two years). Sorry for the mistake. There has been a change in libvirt https://gitlab.com/libvirt/libvirt/-/commit/d7147b3797380de2d159ce6324536f3e1f2d97e3 My reputation for OSSTEST patches is not the greatest, but I took a stab at it regardless :-) https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01208.html Regards, Jim
Re: [PATCH] OSSTEST: Install libtirpc-dev for libvirt builds
On 8/10/20 4:13 AM, Ian Jackson wrote: Jim Fehlig writes ("[PATCH] OSSTEST: Install libtirpc-dev for libvirt builds"): The check for XDR support was changed in libvirt commit d7147b3797 to use libtirpc pkg-config instead of complicated AC_CHECK_LIB, AC_COMPILE_IFELSE, et. al. logic. The libvirt OSSTEST has been failing since this change hit libvirt.git master. Fix it by adding libtirpc-dev to the list of 'extra_packages' installed for libvirt builds. Signed-off-by: Jim Fehlig Reviewed-by: Ian Jackson Thanks! I will push this to osstest pretest shortly. Thanks Ian! Perhaps you've noticed libvirt has now moved to the meson build system. My weak perl skills have discouraged me from investigating ways to accommodate that. Regards, Jim
Re: [libvirt test] 149773: regressions - FAIL
On 6/4/20 6:51 AM, Ian Jackson wrote: Jim Fehlig writes ("Re: [libvirt test] 149773: regressions - FAIL"): On 4/24/20 3:53 AM, osstest service owner wrote: flight 149773 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/149773/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 146182 Probably best to disable these tests to avoid all the spam. I have fixed the build bug now... I saw your patch on the libvirt dev list, thanks! I'm a bit embarrassed for not considering a fix on the libvirt side while trying to address this a few months back :-/. I suspect the upcoming move to meson will be a bit more disruptive and will likely require changes to osstest. Regards, Jim
Re: [Discussion]: Making "LIBXL_HOTPLUG_TIMEOUT" configurable through 'xl.conf'
On 11/24/23 06:04, Olaf Hering wrote: Fri, 24 Nov 2023 13:47:53 +0100 Juergen Gross : As Olaf has said already: this wouldn't cover actions e.g. by libvirt. Jim pointed me to /etc/libvirt/libxl.conf. So from this perspective both xl and libvirt is covered. Now it just takes someone to implement it. I like Juergen's idea of libxl.conf or xen.conf for Xen. This would avoid the duplicate effort of adding support for such host-wide settings to the configuration of external libxl toolstacks like libvirt. And external stacks could immediately use any new settings added to the Xen configuration. Regards, Jim
vnuma_nodes missing pnode 0
Hi All, While fixing [1] a recent downstream libvirt build failure against 4.17 rc3, I noticed the json representation of libxl_vnode_info omits pnode when value is 0. The problem can be seen by starting a VM containing the following vnuma config vnuma = [ [ "pnode=0", "size=2048", "vcpus=0", "vdistances=10,20" ], [ "pnode=1", "size=2048", "vcpus=1", "vdistances=20,10" ] ] The json representation for this config does not contain pnode 0 "vnuma_nodes": [ { "memkb": 2097152, "distances": [ 10, 20 ], "vcpus": [ 0 ] }, { "memkb": 2097152, "distances": [ 20, 10 ], "pnode": 1, "vcpus": [ 1 ] } ], I'm not familiar with the code generator for the *_to_json functions, but with a hint I can probably cook up a patch :-). Regards, Jim [1] https://listman.redhat.com/archives/libvir-list/2022-November/235745.html
Re: vnuma_nodes missing pnode 0
On 11/14/22 01:18, Jan Beulich wrote: On 14.11.2022 07:43, Henry Wang wrote: Sorry, missed Anthony (The toolstack maintainer). Also added him to this thread. Indeed there's nothing x86-ish in here, it's all about data representation. It merely happens to be (for now) x86-specific data which is being dealt with. Internally I indicated to Jim that the way the code presently is generated it looks to me as if 0 was simply taken as the default for "pnode". What I don't know at all is whether the concept of any kind of default is actually valid in json representation of guest configs. 0 is definitely ignored in the generated libxl_vnode_info_gen_json() function, which essentially has if (p->pnode) format-json I took a quick peek at the generator, but being totally unfamiliar could not spot a fix. I'm also not sure how such a fix could be detected for testing purposes by libxl users like libvirt. I.e. how to detect a libxl that emits `"pnode:" 0` in the json representation of libxl_domain_config object and one that does not. Jim
Re: vnuma_nodes missing pnode 0
On 11/14/22 10:56, Anthony PERARD wrote: On Mon, Nov 14, 2022 at 08:53:17AM -0700, Jim Fehlig wrote: On 11/14/22 01:18, Jan Beulich wrote: On 14.11.2022 07:43, Henry Wang wrote: Sorry, missed Anthony (The toolstack maintainer). Also added him to this thread. Indeed there's nothing x86-ish in here, it's all about data representation. It merely happens to be (for now) x86-specific data which is being dealt with. Internally I indicated to Jim that the way the code presently is generated it looks to me as if 0 was simply taken as the default for "pnode". What I don't know at all is whether the concept of any kind of default is actually valid in json representation of guest configs. 0 is definitely ignored in the generated libxl_vnode_info_gen_json() function, which essentially has if (p->pnode) format-json I took a quick peek at the generator, but being totally unfamiliar could not spot a fix. I'm also not sure how such a fix could be detected for testing purposes by libxl users like libvirt. I.e. how to detect a libxl that emits `"pnode:" 0` in the json representation of libxl_domain_config object and one that does not. Well, the missing "pnode: 0' in json isn't exactly a bug, it's been done on purpose, see https://xenbits.xen.org/gitweb/?p=xen.git;h=731233d64f6a7602c1ca297f7b67ec254 When the JSON is been reloaded into it's original struct, libxl_vnode_info, pnode will have the expected value, that is 0, because libxl_vnode_info_init() would have reset this field to 0. I don't think it's possible to change the generator to just have it generate '"pnode": 0', as if we make a change, it would have to be for all unsigned it, I think. Which would likely cause lots of libvirt libxlxml2domconfig test failures. Is it actually wanted to have all those in json, or is it just a case of looking like there's missing part? The latter. ATM, libvirt only uses the json in its unit tests. No functionality is affected. I'm fine with the status quo if you are :-). Thanks, Jim