Re: [Xen-devel] [libvirt test] 140186: regressions - FAIL

2019-08-16 Thread Jim Fehlig
On 8/16/19 7:01 AM, osstest service owner wrote:
> flight 140186 libvirt real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/140186/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>   build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 
> 139829
>   build-i386-libvirt6 libvirt-buildfail REGR. vs. 
> 139829
>   build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 
> 139829
>   build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 
> 139829

Should be fixed now by commit 3b7c5ab9

https://libvirt.org/git/?p=libvirt.git;a=commit;h=3b7c5ab983f4655ae02b8af4517d89839530ee5f

Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [libvirt] domXML modeling question

2019-03-04 Thread Jim Fehlig
Adding xen-devel to cc in case anyone there wants to comment on my latest 
proposal...


On 2/20/19 5:20 PM, Jim Fehlig wrote:
There have been a few requests [1][2] to support Xen's max_grant_frames setting 
in libvirt domXML, but I'm not quite sure how to model it. The documentation [3] 
on this setting states:


Specify the maximum number of grant frames the domain is allowed to have.  This
value controls how many pages the domain is able to grant access to for other
domains, needed e.g. for the operation of paravirtualized devices.  The default
is settable via xl.conf(5).


I've sent a patch to introduce an analogous default in the libvirt libxl driver

https://www.redhat.com/archives/libvir-list/2019-March/msg00123.html



It smells of a  setting, e.g. the amount of memory a domain can share, 
but doesn't map to any of the existing settings. A new subelement  
doesn't feel right. Does anyone suggest a better way of modeling max_grant_frames?


After discussing the max_grant_frames setting a bit more with Juergen I had the 
idea to model it as IO buffer space (or DMA space) of a xenbus "controller". All 
PV devices in the guest connect to the xenbus controller and make use of the 
available I/O buffer space. Guests with more PV devices requiring more buffer 
can increase the space on the xenbus controller device.


One small wrinkle in this idea is that we currently don't model xenbus in 
libvirt. I'd need to add support for a new xenbus controller type and start 
implicitly creating it when creating guests with PV devices, similar to 
auto-creation of controllers in the qemu driver. Also, there is no existing 
controller setting for specifying buffer space. Perhaps a 'ram' attribute could 
be added, similar to specifying memory for  devices? E.g.


  

Any opinion on this approach? Or other ideas for modeling this setting in 
libvirt?

Regards,
Jim



Another option I considered is setting the value based on number of PV devices, 
but I think that flies in the face of libvirt's policy of not dictating policy. 
Regardless of domain config modeling I can work on a driver-wide setting in 
libxl.conf, similar to Xen's xl.conf(5) global.


Regards,
Jim

[1] https://www.redhat.com/archives/libvir-list/2018-April/msg00216.html
[2] https://www.redhat.com/archives/libvirt-users/2019-January/msg00011.html
[3] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xl.cfg.5.pod.in;h=ad81af1ed8cc983c76b5ec2c3aa02e28f042cc63;hb=HEAD#l569 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH for-4.13 v2 1/9] libxl: Offer API versions 0x040700 and 0x040800

2019-11-25 Thread Jim Fehlig
On 10/10/19 9:11 AM, Ian Jackson wrote:
> According to git log -G:
> 
> 0x040700 was introduced in 304400459ef0 (aka 4.7.0-rc1~481)
>"tools/libxl: rename remus device to checkpoint device"
> 
> 0x040800 was introduced in 57f8b13c7240 (aka 4.8.0-rc1~437)
>"libxl: memory size in kb requires 64 bit variable"
> 
> It is surprising that no-one noticed this.

I am now noticing it :-(.

As Anthony noted in V1, libvirt uses LIBXL_API_VERSION and currently has it set 
to 0x040500. I'm attempting to bump libvirt's minimum supported Xen version to 
4.9.0 and for that would use 0x040800, but it's not possible without this 
commit 
backported through 4.9 and picked up and released by all the downstreams.

Any ideas on how to use the APIs changes through 0x040800, but avoid the ones 
introduced in 0x041300 would be much appreciated.

Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 1/2] ts-libvirt-build: Provide PKG_CONFIG_PATH

2019-11-12 Thread Jim Fehlig
On 11/12/19 5:09 AM, Ian Jackson wrote:
> In osstest we do not install the xen tree in /usr/local because the
> build environment is shared with many different build jobs which might
> be using different versions of Xen.  We put it in a job-specific
> directory in ~osstest on the build host, and set environment variables
> to ensure that it all gets picked up.
> 
> Recent versions of libvirt insist on finding xenlight.pc; otherwise
> they disable libxl support.  So we must add a PKG_CONFIG_PATH setting.

Sorry. There was a hack to workaround a fedora 28 bug, but now that it is EOL 
the hack was removed

https://libvirt.org/git/?p=libvirt.git;a=commit;h=18981877d2e20390a79d068861a24e716f8ee422

> (In all cases, contrary to the usual protocol for path-like variables,
> we do not append but instead simply set the variable.  This is OK
> because this is an osstest build script run via ssh to the build host,
> so the variables won't have been set already.)
> 
> CC: Jim Fehlig 
> Signed-off-by: Ian Jackson 
> ---
>   ts-libvirt-build | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/ts-libvirt-build b/ts-libvirt-build
> index bc08190a..2a363f43 100755
> --- a/ts-libvirt-build
> +++ b/ts-libvirt-build
> @@ -60,6 +60,7 @@ sub config() {
>   cd libvirt
>   CFLAGS="-g -I$xenprefix/include/" \\
>   LDFLAGS="-g -L$xenprefix/lib/ -Wl,-rpath-link=$xenprefix/lib/" \\
> +PKG_CONFIG_PATH="$xenprefix/lib/pkgconfig/" \\
>   GNULIB_SRCDIR=$builddir/libvirt/$gnulib->{Path} \\
>   ./autogen.sh --no-git \\
>--with-libxl --without-xen --without-xenapi 
> --without-selinux \\

Unrelated, but the legacy xen and xenapi drivers have been removed so the 
--without-{xen,xenapi} options could be dropped.

Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 2/2] ts-libvirt-build: Do an out-of-tree build

2019-11-12 Thread Jim Fehlig
On 11/12/19 5:09 AM, Ian Jackson wrote:
> Recent versions of libvirt do not support in-tree builds (!)

I assumed libvirt's gradual move from autotools to meson would affect OSSTEST, 
but later rather than sooner. Sorry for not mentioning it earlier, but now you 
have been warned that libvirt is moving to meson :-). Meson has a strict 
separation between source and build directories and some preparatory patches 
were pushed that force srcdir != builddir

https://www.redhat.com/archives/libvir-list/2019-October/msg01681.html

Daniel posted a note about this change yesterday

https://www.redhat.com/archives/libvir-list/2019-November/msg00299.html

I didn't read libvirt mail yesterday otherwise I would have forwarded that to 
xen-devel. I need to be more proactive with libvirt changes that might affect 
OSSTEST...

Regards,
Jim

> 
> Cope with this by always building in a subdirectory `build' (a
> subdirectory of the source tree); this is the arrangement which the
> libvirt upstream messages and documentation now seem to recommend (at
> least where things have been updated).
> 
> I compared the differences in build output between the results of this
> branch and a previous passing xen-unstable flight.  The libvirt
> library version increased and a file
>usr/local/share/libvirt/cpu_map/arm_features.xml
> appeared.  I think this is just due to changes in the libvirt version,
> 2cff65e4c60e..70218e10bcde, in particular 0de541bfc575
>cpu_map: Ship arm_features.xml
> 
> I also tested that a test job, built with current libvirt and these
> osstest changes, passes as expected.
> 
> CC: Jim Fehlig 
> Signed-off-by: Ian Jackson 
> Tested-by: Ian Jackson 
> ---
>   ts-libvirt-build | 12 +++-
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/ts-libvirt-build b/ts-libvirt-build
> index 2a363f43..e799f003 100755
> --- a/ts-libvirt-build
> +++ b/ts-libvirt-build
> @@ -58,11 +58,13 @@ sub config() {
>   my $gnulib = submodule_find($submodules, "gnulib");
>   target_cmd_build($ho, 3600, $builddir, <   cd libvirt
> + mkdir build
> + cd build
>   CFLAGS="-g -I$xenprefix/include/" \\
>   LDFLAGS="-g -L$xenprefix/lib/ -Wl,-rpath-link=$xenprefix/lib/" \\
>   PKG_CONFIG_PATH="$xenprefix/lib/pkgconfig/" \\
>   GNULIB_SRCDIR=$builddir/libvirt/$gnulib->{Path} \\
> -./autogen.sh --no-git \\
> +../autogen.sh --no-git \\
>--with-libxl --without-xen --without-xenapi 
> --without-selinux \\
>--without-lxc --without-vbox --without-uml \\
>--without-qemu --without-openvz --without-vmware \\
> @@ -72,9 +74,9 @@ END
>   
>   sub build() {
>   target_cmd_build($ho, 3600, $builddir, < -cd libvirt
> -(make $makeflags 2>&1 && touch ../build-ok-stamp) |tee ../log
> -test -f ../build-ok-stamp #/
> +cd libvirt/build
> +(make $makeflags 2>&1 && touch ../../build-ok-stamp) |tee ../log
> +test -f ../../build-ok-stamp #/
>   echo ok.
>   END
>   }
> @@ -82,7 +84,7 @@ END
>   sub install() {
>   target_cmd_build($ho, 300, $builddir, <   mkdir -p dist
> -cd libvirt
> +cd libvirt/build
>   make $makeflags install DESTDIR=$builddir/dist
>   mkdir -p $builddir/dist/etc/init.d
>   END
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 2/2] ts-libvirt-build: Do an out-of-tree build

2019-11-12 Thread Jim Fehlig
On 11/12/19 9:10 AM, Ian Jackson wrote:
> Hi.  Thanks for the information.
> 
> Jim Fehlig writes ("Re: [OSSTEST PATCH 2/2] ts-libvirt-build: Do an 
> out-of-tree build"):
>> I assumed libvirt's gradual move from autotools to meson would
>> affect OSSTEST, but later rather than sooner. Sorry for not
>> mentioning it earlier, but now you have been warned that libvirt is
>> moving to meson :-). Meson has a strict separation between source
>> and build directories and some preparatory patches were pushed that
>> force srcdir != builddir
>>
>> https://www.redhat.com/archives/libvir-list/2019-October/msg01681.html
> 
> I read this and some of it is a bit concerning.  Does all of this
>src: [stuff] generate source files into build directory
> mean that previously only in-tree builds were supported and that
> therefore there is no one set of build runes that will work both
> before and after these changes ?

VPATH builds were previously supported, as well as in-tree builds. But 
questions 
around this work are probably best answered by the author. Adding Pavel to cc.

Pavel, for context, see Ian's OSSTEST patches to accommodate recent changes to 
libvirt's build system

https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00514.html

Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xl/libxl: add pvcalls support

2018-03-29 Thread Jim Fehlig

On 03/29/2018 04:07 PM, Stefano Stabellini wrote:

Add pvcalls support to libxl and xl. Create the appropriate pvcalls
entries in xenstore.

Signed-off-by: Stefano Stabellini 

---

Changes in v2:
- rename pvcalls to pvcallsif internally in libxl to avoid `pvcallss'
---
  docs/misc/xenstore-paths.markdown|  9 +
  tools/libxl/Makefile |  2 +-
  tools/libxl/libxl.h  | 10 ++
  tools/libxl/libxl_create.c   |  4 
  tools/libxl/libxl_internal.h |  1 +
  tools/libxl/libxl_pvcalls.c  | 37 
  tools/libxl/libxl_types.idl  |  7 +++
  tools/libxl/libxl_types_internal.idl |  1 +
  tools/xl/xl_parse.c  | 37 +++-
  9 files changed, 106 insertions(+), 2 deletions(-)
  create mode 100644 tools/libxl/libxl_pvcalls.c

diff --git a/docs/misc/xenstore-paths.markdown 
b/docs/misc/xenstore-paths.markdown
index 7be2592..77d1a36 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -299,6 +299,11 @@ A virtual scsi device frontend. Described by
  A virtual usb device frontend. Described by
  [xen/include/public/io/usbif.h][USBIF]
  
+ ~/device/pvcalls/$DEVID/* []

+
+Paravirtualized POSIX function calls frontend. Described by
+[docs/misc/pvcalls.markdown][PVCALLS]
+
   ~/console/* []
  
  The primary PV console device. Described in [console.txt](console.txt)

@@ -377,6 +382,10 @@ A PV SCSI backend.
  
  A PV USB backend. Described by

  [xen/include/public/io/usbif.h][USBIF]
+
+ ~/backend/pvcalls/$DOMID/$DEVID/* []
+
+A PVCalls backend. Described in [docs/misc/pvcalls.markdown][PVCALLS].
  
   ~/backend/console/$DOMID/$DEVID/* []
  
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile

index 917ceb0..035e66e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -140,7 +140,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
-$(LIBXL_OBJS-y)
+libxl_pvcalls.o $(LIBXL_OBJS-y)
  LIBXL_OBJS += libxl_genid.o
  LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
  
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h

index eca0ea2..c4eccc5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2006,6 +2006,16 @@ int libxl_device_p9_destroy(libxl_ctx *ctx, uint32_t 
domid,
  const libxl_asyncop_how *ao_how)
  LIBXL_EXTERNAL_CALLERS_ONLY;
  
+/* pvcalls interface */

+int libxl_device_pvcallsif_remove(libxl_ctx *ctx, uint32_t domid,
+  libxl_device_pvcallsif *pvcallsif,
+  const libxl_asyncop_how *ao_how)
+  LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_device_pvcallsif_destroy(libxl_ctx *ctx, uint32_t domid,
+   libxl_device_pvcallsif *pvcallsif,
+   const libxl_asyncop_how *ao_how)
+   LIBXL_EXTERNAL_CALLERS_ONLY;
+
  /* PCI Passthrough */
  int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid,
   libxl_device_pci *pcidev,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c498135..c43f391 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1374,6 +1374,10 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
  for (i = 0; i < d_config->num_p9s; i++)
  libxl__device_add(gc, domid, &libxl__p9_devtype, &d_config->p9s[i]);
  
+for (i = 0; i < d_config->num_pvcallsifs; i++)

+libxl__device_add(gc, domid, &libxl__pvcallsif_devtype,
+  &d_config->pvcallsifs[i]);
+
  switch (d_config->c_info.type) {
  case LIBXL_DOMAIN_TYPE_HVM:
  {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 506687f..50209ff 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3648,6 +3648,7 @@ extern const struct libxl_device_type 
libxl__usbdev_devtype;
  extern const struct libxl_device_type libxl__pcidev_devtype;
  extern const struct libxl_device_type libxl__vdispl_devtype;
  extern const struct libxl_device_type libxl__p9_devtype;
+extern const struct libxl_device_type libxl__pvcallsif_devtype;
  
  extern const struct libxl_device_type *device_type_tbl[];
  
diff --git a/tools/libxl/libxl_pvcalls.c b/tools/libxl/libxl_pvcalls.c

new file mode 100644
index 000..bb6f307
--- /dev/null
+++ b/tools/libxl/libxl_pvcalls.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2018  Aporeto
+ * Author Stefano Stabellini 
+ *
+ * This program is free software; you ca

Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-07-16 Thread Jim Fehlig

On 06/13/2018 05:18 AM, Ian Jackson wrote:

Jim: please read down to where I discuss
test-amd64-amd64-libvirt-pair.  If you have any insight I'd appreciate
it.  Let me know if you want me to preserve the logs, which will
otherwise expire in a few weeks.


Whoa, sorry for the delay. This mail found a dumb bug in my filter for xen-devel 
mail.



  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass in 
123701


 From the log:

2018-06-12 20:59:40 Z executing ssh ... root@172.16.144.61 virsh migrate --live 
debian.guest.osstest xen+ssh://joubertin0
error: Timed out during operation: cannot acquire state change lock
2018-06-12 21:00:16 Z command nonzero waitstatus 256: [..]

The libvirt libxl logs seem to show libxl doing a successful
migration.


With the long delay, I'm afraid the logs have expired. Do you still see the 
problem? All the recent runs seem to be plagued with libvirt's change to require 
GnuTLS


https://libvirt.org/git/?p=libvirt.git;a=commit;h=60d9ad6f1e42618fce10baeb0f02c35e5ebd5b24


Looking at the logs I see this:

2018-06-12 21:00:16.784+: 3507: warning :
libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain
debian.guest.osstest; current job is (modify) owned by (24947)

That job number looks like it's about right for a pid, but I think it
must be a thread because it doesn't show up in the ps output.


Likely a libvirtd worker thread doing something that requires modifying the 
state of virDomainObj.



I did see this:

Jun 12 21:00:20 joubertin0 logger: /etc/xen/scripts/vif-bridge: iptables setup 
failed. This may affect guest networking.

but that seems to be after the failure.


A wild guess, but is it possible thread 24947 is running a domain create 
operation, which includes executing vif-bridge, that is taking longer than 
expected to complete?



I don't have an explanation.  I don't really know what this lock is.


It's a lock that serializes domain state modifications (changing virDomainObj). 
Wait time for the lock is currently hardcoded to 30sec. The thread emitting the 
warning surpassed the timeout, waiting for 24947 to finish whatever it was doing.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] libvirtd hang on CentOS6 after latest updates

2018-07-25 Thread Jim Fehlig

On 07/22/2018 04:03 PM, Karel Hendrych wrote:
Hi, I am seeing frequent libvirtd hangs (clients not responding) after last 
CentOS6-Xen update :


xen-devel is not the best place to seek help with downstream issues, 
particularly libvirt ones :-). You would have better luck contacting the CentOS6 
maintainers.


Regards,
Jim



libvirt-libs-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-network-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-nwfilter-4.1.0-2.xen46.el6.x86_64
libgcc-4.4.7-18.el6_9.2.x86_64
2:qemu-img-0.12.1.2-2.503.el6_9.5.x86_64
libvirt-daemon-driver-storage-core-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-secret-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-interface-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-nodedev-4.1.0-2.xen46.el6.x86_64
10:centos-release-xen-common-8-4.el6.x86_64
xen-licenses-4.6.6-12.el6.x86_64
xen-libs-4.6.6-12.el6.x86_64
libvirt-daemon-driver-libxl-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-xen-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-qemu-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-gluster-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-logical-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-mpath-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-disk-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-scsi-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-iscsi-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-storage-4.1.0-2.xen46.el6.x86_64
libstdc++-4.4.7-18.el6_9.2.x86_64
libvirt-daemon-config-nwfilter-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-config-network-4.1.0-2.xen46.el6.x86_64
libvirt-daemon-driver-lxc-4.1.0-2.xen46.el6.x86_64
libvirt-client-4.1.0-2.xen46.el6.x86_64
linux-firmware-20171215-82.git2451bb22.el6.noarch
12:dhcp-common-4.1.1-53.P1.el6.centos.4.x86_64
12:dhclient-4.1.1-53.P1.el6.centos.4.x86_64
libvirt-4.1.0-2.xen46.el6.x86_64
10:centos-release-xen-46-8-4.el6.x86_64
10:centos-release-xen-44-8-4.el6.x86_64
tzdata-2018e-3.el6.noarch
libgomp-4.4.7-18.el6_9.2.x86_64
kernel-4.9.86-30.el6.x86_64
xen-hypervisor-4.6.6-12.el6.x86_64
xen-runtime-4.6.6-12.el6.x86_64
xen-4.6.6-12.el6.x86_64
libvirt-daemon-xen-4.1.0-2.xen46.el6.x86_64

Remedy is to kill -9 libvirtd and start again. Issue can be replicated within 
few domU starts. Usually libvirtd hangs when domU is bringing up xen drivers or 
something around udev, like:


xen_netfront: Initialising Xen virtual ethernet driver

I've been looking into libvirtd strace and debug logs, so far most suspicious in 
libvirtd debug log is this:


libvirtd.log:2018-05-22 08:32:44.760+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-7'
libvirtd.log:2018-05-22 08:32:44.761+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-6'
libvirtd.log:2018-05-22 08:32:44.761+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-4'
libvirtd.log:2018-05-22 08:32:44.762+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-5'
libvirtd.log:2018-05-22 08:32:44.763+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-2'
libvirtd.log:2018-05-22 08:32:44.764+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/tx-3'
libvirtd.log:2018-05-22 08:32:44.765+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-6'
libvirtd.log:2018-05-22 08:32:44.766+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-5'
libvirtd.log:2018-05-22 08:32:44.767+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-4'
libvirtd.log:2018-05-22 08:32:44.767+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-7'
libvirtd.log:2018-05-22 08:32:44.768+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-2'
libvirtd.log:2018-05-22 08:32:44.769+: 25455: debug : 
udevRemoveOneDevice:1289 : Failed to find device to remove that has udev name 
'/sys/devices/vif-24-0/net/vif24.0/queues/rx-3'


I could not get rid of this by reducing amount of driver queues (not sure if 
that applies to PV)


Is someone out there seeing similar issues? Anyone perhaps interested in 
reviewing full deb

Re: [Xen-devel] libvirtd hang on CentOS6 after latest updates

2018-07-25 Thread Jim Fehlig

On 07/25/2018 10:17 AM, George Dunlap wrote:

On Wed, Jul 25, 2018 at 4:42 PM, Jim Fehlig  wrote:

On 07/22/2018 04:03 PM, Karel Hendrych wrote:


Hi, I am seeing frequent libvirtd hangs (clients not responding) after
last CentOS6-Xen update :



xen-devel is not the best place to seek help with downstream issues,
particularly libvirt ones :-). You would have better luck contacting the
CentOS6 maintainers.


In this case, it looks very much like they're suing the Virt SIG
binaries, which are pretty close to being straight-up packing of the
upstream tarballs, and the maintainers would be Anthony & I.  And I at
least know very little about libvirt.  If Karel had posted this on
centos-devel, I would almost certainly have ended up asking him to
repost to xen-devel anyway, at which point I would have cc'd you. :-)


Heh, ok :-).


Does the error ring any bells?


The udev messages are 'debug' level (not fatal) and unrelated IMO. It would be 
best to attach gdb to the libvirtd process and get a backtrace of all threads.


Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST] Install GnuTLS for libvirt builds

2018-07-30 Thread Jim Fehlig
Since libvirt commit 60d9ad6f GnuTLS is required to build libvirt. The
various libvirt build tests in osstest began failing after the commit
hit libvirt.git master. Adding libgnutls28-dev to the list of packages
needed to build libvirt will fix the currently broken builds.

Signed-off-by: Jim Fehlig 
---

I cribbed the 'libgnutls28-dev' package name from the libvirt jenkins CI

https://libvirt.org/git/?p=libvirt-jenkins-ci.git;a=blob;f=guests/vars/mappings.yml;h=be356aae616e7dacf603175fe1bea8ce398629e1;hb=HEAD#l138

 Osstest/Toolstack/libvirt.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm
index 45df173..d5cda77 100644
--- a/Osstest/Toolstack/libvirt.pm
+++ b/Osstest/Toolstack/libvirt.pm
@@ -26,7 +26,7 @@ use XML::LibXML;
 
 sub new {
 my ($class, $ho, $methname,$asset) = @_;
-my @extra_packages = qw(libavahi-client3);
+my @extra_packages = qw(libavahi-client3 libgnutls28-dev);
 my $nl_lib = "libnl-3-200";
 $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/);
 push(@extra_packages, $nl_lib);
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Likely build race, "/usr/bin/ld: cannot find -lvirt"

2018-05-24 Thread Jim Fehlig

On 05/24/2018 04:27 AM, Ian Jackson wrote:

Ian Jackson writes ("Likely build race, "/usr/bin/ld: cannot find -lvirt""):

tl;dr:

I think there is a bug in libvirt's build system which, with
low probability, causes a build failure containing this message:
   /usr/bin/ld: cannot find -lvirt

Complete build logs of two attempts:

   
http://logs.test-lab.xenproject.org/osstest/logs/123046/build-i386-libvirt/6.ts-libvirt-build.log

   
http://logs.test-lab.xenproject.org/osstest/logs/123096/build-i386-libvirt/6.ts-libvirt-build.log


I have run a number of attempts.  Out of 5 more, 1 succeeded.  So out
of a total of 7 attempts, 1 succeeded.  This repro rate is an IMO
excellent opportunity to debug this race :-).


There appears to be a missing dependency between the lockd library and libvirt 
library, but my autotools skills lack the savvy to find it. Here we see the 
install command and relinking of lockd.la


 /bin/bash ../libtool   --mode=install /usr/bin/install -c   lockd.la 
'/home/osstest/build.123096.build-i386-libvirt/dist/usr/local/lib/libvirt/lock-driver'

libtool: install: warning: relinking `lockd.la'
libtool: install: (cd /home/osstest/build.123096.build-i386-libvirt/libvirt/src; 
/bin/bash /home/osstest/build.123096.build-i386-libvirt/libvirt/libtool 
--silent --tag CC --mode=relink gcc -std=gnu99 -I./conf -I/usr/include/libxml2 
-fno-common -W -Waddress -Waggressive-loop-optimizations -Wall -Wattributes 
-Wbad-function-cast -Wbuiltin-macro-redefined -Wcast-align -Wchar-subscripts 
-Wclobbered -Wcomment -Wcomments -Wcoverage-mismatch -Wcpp -Wdate-time 
-Wdeprecated-declarations -Wdiv-by-zero -Wdouble-promotion -Wempty-body 
-Wendif-labels -Wextra -Wformat-contains-nul -Wformat-extra-args 
-Wformat-security -Wformat-y2k -Wformat-zero-length -Wfree-nonheap-object 
-Wignored-qualifiers -Wimplicit -Wimplicit-function-declaration -Wimplicit-int 
-Winit-self -Winline -Wint-to-pointer-cast -Winvalid-memory-model -Winvalid-pch 
-Wjump-misses-init -Wlogical-op -Wmain -Wmaybe-uninitialized 
-Wmemset-transposed-args -Wmissing-braces -Wmissing-declarations 
-Wmissing-field-initializers -Wmissing-include-dirs -Wmissing-parameter-type 
-Wmissing-prototypes -Wmultichar -Wnarrowing -Wnested-externs -Wnonnull 
-Wold-style-declaration -Wold-style-definition -Wopenmp-simd -Woverflow 
-Woverride-init -Wpacked-bitfield-compat -Wparentheses -Wpointer-arith 
-Wpointer-sign -Wpointer-to-int-cast -Wpragmas -Wpsabi -Wreturn-local-addr 
-Wreturn-type -Wsequence-point -Wshadow -Wsizeof-pointer-memaccess 
-Wstrict-aliasing -Wstrict-prototypes -Wsuggest-attribute=const 
-Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wsuggest-attribute=pure 
-Wswitch -Wsync-nand -Wtrampolines -Wtrigraphs -Wtype-limits -Wuninitialized 
-Wunknown-pragmas -Wunused -Wunused-but-set-parameter -Wunused-but-set-variable 
-Wunused-function -Wunused-label -Wunused-local-typedefs -Wunused-parameter 
-Wunused-result -Wunused-value -Wunused-variable -Wvarargs -Wvariadic-macros 
-Wvector-operation-performance -Wvolatile-register-var -Wwrite-strings 
-Wnormalized=nfc -Wno-sign-compare -Wjump-misses-init -Wswitch-enum 
-Wno-format-nonliteral -fstack-protector-strong -fexceptions 
-fasynchronous-unwind-tables -fipa-pure-const -Wno-suggest-attribute=pure 
-Wno-suggest-attribute=const -Werror -Wframe-larger-than=4096 -g 
-I/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/include/ 
-DLIBXL_API_VERSION=0x040400 -module -avoid-version -Wl,-z -Wl,nodelete 
-export-dynamic -Wl,-z -Wl,relro -Wl,-z -Wl,now -Wl,--no-copy-dt-needed-entries 
-Wl,-z -Wl,defs -g 
-L/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/lib/ 
-Wl,-rpath-link=/home/osstest/build.123096.build-i386-libvirt/xendist/usr/local/lib/ 
-o lockd.la -rpath /usr/local/lib/libvirt/lock-driver 
locking/lockd_la-lock_driver_lockd.lo locking/lockd_la-lock_protocol.lo 
libvirt.la ../gnulib/lib/libgnu.la -ldl -inst-prefix-dir 
/home/osstest/build.123096.build-i386-libvirt/dist)

/usr/bin/ld: cannot find -lvirt
collect2: error: ld returned 1 exit status
libtool: install: error: relink `lockd.la' with the above command before 
installing it

Makefile:6410: recipe for target 'install-lockdriverLTLIBRARIES' failed

and several lines later it seems another thread finally finishes libvirt.la

libtool: install: /usr/bin/install -c .libs/libvirt.lai 
/home/osstest/build.123096.build-i386-libvirt/dist/usr/local/lib/libvirt.la


I've stared at the various Makefile.{,inc.}am files but can't spot the problem. 
Perhaps other libvirt maintainers with better autotools skills can give some hints.


Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL

2018-08-22 Thread Jim Fehlig

On 08/21/2018 05:14 AM, Jan Beulich wrote:

On 21.08.18 at 03:11,  wrote:

flight 126201 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126201/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. 
vs. 124328


Something needs to be done about this, as this continued failure is
blocking the 4.9.3 release. I did mail about this on Aug 2nd already
for flight 125710, I've got back from Wei:


This is libvirtd's error message.

The remote host can't obtain the state change log due to it is already
held by another task/thread. It could be a libvirt / libxl bug.

2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 :
Cannot start job (modify) for domain debian.guest.osstest; current job is 
(modify) owned by (24975)


I took a closer look at the logs and it appears the finish phase of migration 
fails to acquire the domain job lock since it is already held by the perform 
phase. In the perform phase, after the vm has been transferred to the dst, the 
qemu process associated with the vm is started. For whatever reason that takes a 
long time on this host:


2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: 
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with 
arguments: ...
2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event: domain 
1 device model: spawn watch p=(null)

...
2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback: watch 
w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event 
epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event: domain 
1 device model: spawn watch p=running


In the meantime we move to the finish phase and timeout waiting for the above 
perform phase to complete


2018-08-19 17:05:19.096+: 3492: debug : virThreadJobSet:96 : Thread 3492 
(virNetServerHandleJob) is now running job remoteDispatchDomainMigrateFinish3Params

...
2018-08-19 17:05:49.253+: 3492: warning : libxlDomainObjBeginJob:151 : 
Cannot start job (modify) for domain debian.guest.osstest; current job is 
(modify) owned by (24982)
2018-08-19 17:05:49.253+: 3492: error : libxlDomainObjBeginJob:155 : Timed 
out during operation: cannot acquire state change lock


What could be causing the long startup time of qemu on these hosts? Does dom0 
have enough cpu/memory? As you noticed, the libvirt commit used for this test 
has not changed in a long time, well before the failures appeared. Perhaps a 
subtle change in libxl is exposing the bug?


Regardless, I'm happy to have looked at the issue since I think libvirt can be 
improved to cope with the problem. The thread running in the dst receiving the 
vm via libxl_domain_create_restore() can be created with joinable flag, then 
joined in the finish phase before attempting to acquire the job lock. I'll look 
into making such an improvement in libvirt's libxl driver.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [libvirt test] 126429: regressions - FAIL

2018-08-24 Thread Jim Fehlig

On 08/24/2018 04:48 AM, Wei Liu wrote:


On Fri, Aug 24, 2018 at 10:25:49AM +, osstest service owner wrote:

flight 126429 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126429/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  build-i386-libvirt6 libvirt-buildfail REGR. vs. 123814
  build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 123814
  build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 123814


Missing build dependency in osstest.  GnuTLS has become a hard
requirement.


I mentioned this a while ago, and even sent a patch :-)

https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg02584.html

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/5] libxl: various migration V3 improvements

2018-09-05 Thread Jim Fehlig
Patch 5 fixes a long standing problem found by some very slow hosts in
xen's osstest

https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg01945.html

While working on the fix, I discovered other problems in libxl's V3
migration protocol. E.g. a modify job on the migrating VM was not
handled properly across the phases on either src or dst host. Patches
1-4 fix this and other problems found along the way.

Jim Fehlig (5):
  libxl: migration: defer removing VM until finish phase
  libxl: fix logic in P2P migration
  libxl: fix job handling across migration phases on src
  libxl: fix job handling across migration phases on dst
  libxl: join with thread receiving migration data

 src/libxl/libxl_domain.h|   1 +
 src/libxl/libxl_driver.c|   7 --
 src/libxl/libxl_migration.c | 168 
 3 files changed, 114 insertions(+), 62 deletions(-)

-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/5] libxl: fix job handling across migration phases on src

2018-09-05 Thread Jim Fehlig
The libxlDomainMigrationSrc* functions are a bit flawed in their
handling of modify jobs. A job begins at the start of the begin
phase but ends before the phase completes. No job is running for
the remaining phases of migration on the source host.

Change the logic to keep the job running after a successful begin
phase, and end the job in the confirm phase. The job must also end
in the perform phase in the case of error since confirm phase would
not be executed.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_migration.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index e4f2895690..191973edeb 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -399,6 +399,11 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
 virDomainDefPtr def;
 char *xml = NULL;
 
+/*
+ * In the case of successful migration, a job is started here and
+ * terminated in the confirm phase. Errors in the begin or perform
+ * phase will also terminate the job.
+ */
 if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
 goto cleanup;
 
@@ -428,6 +433,9 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
 goto endjob;
 
 xml = virDomainDefFormat(def, cfg->caps, VIR_DOMAIN_DEF_FORMAT_SECURE);
+/* Valid xml means success! EndJob in the confirm phase */
+if (xml)
+goto cleanup;
 
  endjob:
 libxlDomainObjEndJob(driver, vm);
@@ -1169,6 +1177,14 @@ libxlDomainMigrationSrcPerformP2P(libxlDriverPrivatePtr 
driver,
 ret = libxlDoMigrateSrcP2P(driver, vm, sconn, xmlin, dconn, dconnuri,
dname, uri_str, flags);
 
+if (ret < 0) {
+/*
+ * Confirm phase will not be executed if perform fails. End the
+ * job started in begin phase.
+ */
+libxlDomainObjEndJob(driver, vm);
+}
+
  cleanup:
 orig_err = virSaveLastError();
 virObjectUnlock(vm);
@@ -1232,11 +1248,17 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivatePtr 
driver,
 ret = libxlDoMigrateSrcSend(driver, vm, flags, sockfd);
 virObjectLock(vm);
 
-if (ret < 0)
+if (ret < 0) {
 virDomainLockProcessResume(driver->lockManager,
"xen:///system",
vm,
priv->lockState);
+/*
+ * Confirm phase will not be executed if perform fails. End the
+ * job started in begin phase.
+ */
+libxlDomainObjEndJob(driver, vm);
+}
 
  cleanup:
 VIR_FORCE_CLOSE(sockfd);
@@ -1386,6 +1408,8 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivatePtr 
driver,
 ret = 0;
 
  cleanup:
+/* EndJob for corresponding BeginJob in begin phase */
+libxlDomainObjEndJob(driver, vm);
 virObjectEventStateQueue(driver->domainEventState, event);
 virObjectUnref(cfg);
 return ret;
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/5] libxl: fix logic in P2P migration

2018-09-05 Thread Jim Fehlig
libxlDoMigrateSrcP2P() performs all phases of the migration
protocol for peer-to-peer migration. Unfortunately the logic
was a bit flawed since it is possible to skip the confirm
phase after a successfull begin and prepare phase. Fix the
logic to always call the confirm phase after a successful begin
and perform. Skip the confirm phase if begin or perform fail.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_migration.c | 48 ++---
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index 97f72d0390..e4f2895690 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -972,30 +972,35 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver,
 char *cookieout = NULL;
 int cookieoutlen;
 bool cancelled = true;
+bool notify_source = true;
 virErrorPtr orig_err = NULL;
 int ret = -1;
 /* For tunnel migration */
 virStreamPtr st = NULL;
 struct libxlTunnelControl *tc = NULL;
 
+if (dname &&
+virTypedParamsAddString(¶ms, &nparams, &maxparams,
+VIR_MIGRATE_PARAM_DEST_NAME, dname) < 0)
+goto cleanup;
+
+if (uri &&
+virTypedParamsAddString(¶ms, &nparams, &maxparams,
+VIR_MIGRATE_PARAM_URI, uri) < 0)
+goto cleanup;
+
 dom_xml = libxlDomainMigrationSrcBegin(sconn, vm, xmlin,
&cookieout, &cookieoutlen);
+/*
+ * If dom_xml is non-NULL the begin phase has succeeded, and the
+ * confirm phase must be called to cleanup the migration operation.
+ */
 if (!dom_xml)
 goto cleanup;
 
 if (virTypedParamsAddString(¶ms, &nparams, &maxparams,
 VIR_MIGRATE_PARAM_DEST_XML, dom_xml) < 0)
-goto cleanup;
-
-if (dname &&
-virTypedParamsAddString(¶ms, &nparams, &maxparams,
-VIR_MIGRATE_PARAM_DEST_NAME, dname) < 0)
-goto cleanup;
-
-if (uri &&
-virTypedParamsAddString(¶ms, &nparams, &maxparams,
-VIR_MIGRATE_PARAM_URI, uri) < 0)
-goto cleanup;
+goto confirm;
 
 /* We don't require the destination to have P2P support
  * as it looks to be normal migration from the receiver perpective.
@@ -1006,7 +1011,7 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver,
 virObjectUnlock(vm);
 if (flags & VIR_MIGRATE_TUNNELLED) {
 if (!(st = virStreamNew(dconn, 0)))
-goto cleanup;
+goto confirm;
 ret = dconn->driver->domainMigratePrepareTunnel3Params
 (dconn, st, params, nparams, cookieout, cookieoutlen, NULL, NULL, 
destflags);
 } else {
@@ -1016,7 +1021,7 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver,
 virObjectLock(vm);
 
 if (ret == -1)
-goto cleanup;
+goto confirm;
 
 if (!(flags & VIR_MIGRATE_TUNNELLED)) {
 if (uri_out) {
@@ -1038,8 +1043,10 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver,
 else
 ret = libxlDomainMigrationSrcPerform(driver, vm, NULL, NULL,
  uri_out, NULL, flags);
-if (ret < 0)
+if (ret < 0) {
+notify_source = false;
 orig_err = virSaveLastError();
+}
 
 cancelled = (ret < 0);
 
@@ -1067,12 +1074,15 @@ libxlDoMigrateSrcP2P(libxlDriverPrivatePtr driver,
 if (!orig_err)
 orig_err = virSaveLastError();
 
-VIR_DEBUG("Confirm3 cancelled=%d vm=%p", cancelled, vm);
-ret = libxlDomainMigrationSrcConfirm(driver, vm, flags, cancelled);
+ confirm:
+if (notify_source) {
+VIR_DEBUG("Confirm3 cancelled=%d vm=%p", cancelled, vm);
+ret = libxlDomainMigrationSrcConfirm(driver, vm, flags, cancelled);
 
-if (ret < 0)
-VIR_WARN("Guest %s probably left in 'paused' state on source",
- vm->def->name);
+if (ret < 0)
+VIR_WARN("Guest %s probably left in 'paused' state on source",
+ vm->def->name);
+}
 
  cleanup:
 if (flags & VIR_MIGRATE_TUNNELLED) {
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/5] libxl: join with thread receiving migration data

2018-09-05 Thread Jim Fehlig
It is possible the incoming VM is not fully started when the finish
phase of migration is executed. In libxlDomainMigrationDstFinish,
wait for the thread receiving the VM to complete before executing
finish phase tasks.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_domain.h|  1 +
 src/libxl/libxl_migration.c | 20 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/libxl/libxl_domain.h b/src/libxl/libxl_domain.h
index 5d83230cd6..e193881450 100644
--- a/src/libxl/libxl_domain.h
+++ b/src/libxl/libxl_domain.h
@@ -65,6 +65,7 @@ struct _libxlDomainObjPrivate {
 /* console */
 virChrdevsPtr devs;
 libxl_evgen_domain_death *deathW;
+virThreadPtr migrationDstReceiveThr;
 unsigned short migrationPort;
 char *lockState;
 
diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index 54b01a3169..fc7ccb53d0 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -297,9 +297,9 @@ libxlMigrateDstReceive(virNetSocketPtr sock,
 libxlMigrationDstArgs *args = opaque;
 virNetSocketPtr *socks = args->socks;
 size_t nsocks = args->nsocks;
+libxlDomainObjPrivatePtr priv = args->vm->privateData;
 virNetSocketPtr client_sock;
 int recvfd = -1;
-virThread thread;
 size_t i;
 
 /* Accept migration connection */
@@ -318,7 +318,10 @@ libxlMigrateDstReceive(virNetSocketPtr sock,
  * the migration data
  */
 args->recvfd = recvfd;
-if (virThreadCreate(&thread, false,
+VIR_FREE(priv->migrationDstReceiveThr);
+if (VIR_ALLOC(priv->migrationDstReceiveThr) < 0)
+goto fail;
+if (virThreadCreate(priv->migrationDstReceiveThr, true,
 libxlDoMigrateDstReceive, args) < 0) {
 virReportError(VIR_ERR_OPERATION_FAILED, "%s",
_("Failed to create thread for receiving migration 
data"));
@@ -557,7 +560,6 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn,
 libxlDriverPrivatePtr driver = dconn->privateData;
 virDomainObjPtr vm = NULL;
 libxlMigrationDstArgs *args = NULL;
-virThread thread;
 bool taint_hook = false;
 libxlDomainObjPrivatePtr priv = NULL;
 char *xmlout = NULL;
@@ -617,7 +619,10 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn,
 args->nsocks = 0;
 mig = NULL;
 
-if (virThreadCreate(&thread, false, libxlDoMigrateDstReceive, args) < 0) {
+VIR_FREE(priv->migrationDstReceiveThr);
+if (VIR_ALLOC(priv->migrationDstReceiveThr) < 0)
+goto error;
+if (virThreadCreate(priv->migrationDstReceiveThr, true, 
libxlDoMigrateDstReceive, args) < 0) {
 virReportError(VIR_ERR_OPERATION_FAILED, "%s",
_("Failed to create thread for receiving migration 
data"));
 goto endjob;
@@ -1291,6 +1296,13 @@ libxlDomainMigrationDstFinish(virConnectPtr dconn,
 virObjectEventPtr event = NULL;
 virDomainPtr dom = NULL;
 
+if (priv->migrationDstReceiveThr) {
+virObjectUnlock(vm);
+virThreadJoin(priv->migrationDstReceiveThr);
+virObjectLock(vm);
+VIR_FREE(priv->migrationDstReceiveThr);
+}
+
 virPortAllocatorRelease(priv->migrationPort);
 priv->migrationPort = 0;
 
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/5] libxl: migration: defer removing VM until finish phase

2018-09-05 Thread Jim Fehlig
If for any reason the restore of a VM fails on the destination host
in a migration operation, the VM is removed (if not persistent) from
the virDomainObjList, meaning it is no longer available for additional
cleanup or processing in the finish phase. Defer removing the VM from
the virDomainObjList until the finish phase, which already contains
logic to remove the VM.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_migration.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index b2e5847c58..97f72d0390 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -264,7 +264,6 @@ libxlDoMigrateDstReceive(void *opaque)
 libxlDriverPrivatePtr driver = args->conn->privateData;
 int recvfd = args->recvfd;
 size_t i;
-int ret;
 
 virObjectRef(vm);
 virObjectLock(vm);
@@ -274,12 +273,10 @@ libxlDoMigrateDstReceive(void *opaque)
 /*
  * Always start the domain paused.  If needed, unpause in the
  * finish phase, after transfer of the domain is complete.
+ * Errors and cleanup are also handled in the finish phase.
  */
-ret = libxlDomainStartRestore(driver, vm, true, recvfd,
-  args->migcookie->xenMigStreamVer);
-
-if (ret < 0 && !vm->persistent)
-virDomainObjListRemove(driver->domains, vm);
+libxlDomainStartRestore(driver, vm, true, recvfd,
+args->migcookie->xenMigStreamVer);
 
 /* Remove all listen socks from event handler, and close them. */
 for (i = 0; i < nsocks; i++) {
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/5] libxl: fix job handling across migration phases on dst

2018-09-05 Thread Jim Fehlig
The libxlDomainMigrationDst* functions are a bit flawed in their
handling of modify jobs. A job begins when the destination host
begins receiving the incoming VM and ends after the VM is started.
The finish phase contains another BeginJob/EndJob sequence.

This patch changes the logic to begin a job for the incoming VM
in the prepare phase and end the job in the finish phase.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_driver.c|  7 
 src/libxl/libxl_migration.c | 65 +++--
 2 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index 5a5e792957..73c2ff3546 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -6020,15 +6020,8 @@ libxlDomainMigrateFinish3Params(virConnectPtr dconn,
 return NULL;
 }
 
-if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) {
-virDomainObjEndAPI(&vm);
-return NULL;
-}
-
 ret = libxlDomainMigrationDstFinish(dconn, vm, flags, cancelled);
 
-libxlDomainObjEndJob(driver, vm);
-
 virDomainObjEndAPI(&vm);
 
 return ret;
diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index 191973edeb..54b01a3169 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -266,9 +266,6 @@ libxlDoMigrateDstReceive(void *opaque)
 size_t i;
 
 virObjectRef(vm);
-virObjectLock(vm);
-if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
-goto cleanup;
 
 /*
  * Always start the domain paused.  If needed, unpause in the
@@ -288,10 +285,6 @@ libxlDoMigrateDstReceive(void *opaque)
 args->nsocks = 0;
 VIR_FORCE_CLOSE(recvfd);
 virObjectUnref(args);
-
-libxlDomainObjEndJob(driver, vm);
-
- cleanup:
 virDomainObjEndAPI(&vm);
 }
 
@@ -583,6 +576,13 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn,
 goto error;
 *def = NULL;
 
+/*
+ * Unless an error is encountered in this function, the job will
+ * be terminated in the finish phase.
+ */
+if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
+goto error;
+
 priv = vm->privateData;
 
 if (taint_hook) {
@@ -595,18 +595,18 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn,
  * stream -> pipe -> recvfd of libxlDomainStartRestore
  */
 if (pipe(dataFD) < 0)
-goto error;
+goto endjob;
 
 /* Stream data will be written to pipeIn */
 if (virFDStreamOpen(st, dataFD[1]) < 0)
-goto error;
+goto endjob;
 dataFD[1] = -1; /* 'st' owns the FD now & will close it */
 
 if (libxlMigrationDstArgsInitialize() < 0)
-goto error;
+goto endjob;
 
 if (!(args = virObjectNew(libxlMigrationDstArgsClass)))
-goto error;
+goto endjob;
 
 args->conn = virObjectRef(dconn);
 args->vm = virObjectRef(vm);
@@ -620,12 +620,15 @@ libxlDomainMigrationDstPrepareTunnel3(virConnectPtr dconn,
 if (virThreadCreate(&thread, false, libxlDoMigrateDstReceive, args) < 0) {
 virReportError(VIR_ERR_OPERATION_FAILED, "%s",
_("Failed to create thread for receiving migration 
data"));
-goto error;
+goto endjob;
 }
 
 ret = 0;
 goto done;
 
+ endjob:
+libxlDomainObjEndJob(driver, vm);
+
  error:
 libxlMigrationCookieFree(mig);
 VIR_FORCE_CLOSE(dataFD[1]);
@@ -679,6 +682,13 @@ libxlDomainMigrationDstPrepare(virConnectPtr dconn,
 goto error;
 *def = NULL;
 
+/*
+ * Unless an error is encountered in this function, the job will
+ * be terminated in the finish phase.
+ */
+if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
+goto error;
+
 priv = vm->privateData;
 
 if (taint_hook) {
@@ -689,27 +699,27 @@ libxlDomainMigrationDstPrepare(virConnectPtr dconn,
 /* Create socket connection to receive migration data */
 if (!uri_in) {
 if ((hostname = virGetHostname()) == NULL)
-goto error;
+goto endjob;
 
 if (STRPREFIX(hostname, "localhost")) {
 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("hostname on destination resolved to localhost,"
  " but migration requires an FQDN"));
-goto error;
+goto endjob;
 }
 
 if (virPortAllocatorAcquire(driver->migrationPorts, &port) < 0)
-goto error;
+goto endjob;
 
 priv->migrationPort = port;
 if (virAsprintf(uri_out, "tcp://%s:%d", hostname, port) < 0)
-goto error;
+goto endjob;
 } else {
 if (!(STRPREFIX(uri_in, "tcp://"))) {
 /* not full URI, add prefix tcp:// */
  

Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL

2018-09-05 Thread Jim Fehlig

On 08/24/2018 02:58 AM, Wei Liu wrote:

On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:

On 08/21/2018 05:14 AM, Jan Beulich wrote:

On 21.08.18 at 03:11,  wrote:

flight 126201 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126201/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. 
vs. 124328


Something needs to be done about this, as this continued failure is
blocking the 4.9.3 release. I did mail about this on Aug 2nd already
for flight 125710, I've got back from Wei:


This is libvirtd's error message.

The remote host can't obtain the state change log due to it is already
held by another task/thread. It could be a libvirt / libxl bug.

2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 :
Cannot start job (modify) for domain debian.guest.osstest; current job is 
(modify) owned by (24975)


I took a closer look at the logs and it appears the finish phase of
migration fails to acquire the domain job lock since it is already held by
the perform phase. In the perform phase, after the vm has been transferred
to the dst, the qemu process associated with the vm is started. For whatever
reason that takes a long time on this host:

2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
arguments: ...
2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event:
domain 1 device model: spawn watch p=(null)


This is a spurious event after the watch has been set up.


...
2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback:
watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
event epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event:
domain 1 device model: spawn watch p=running


So it has taken 32s for QEMU to write "running" in xenstore. This,
however, is still within the timeout limit set by libxl (60s).


Right, but it is not within libvirt's job wait timeout, which is 30s.

I've sent a series to fix this and other problems I found while 
testing/debugging

https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html

Assuming those patches are committed to libvirt.git master, it's not clear how 
they will improve this and other tests that use an older, fixed libvirt commit.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH OSSTEST] Install GnuTLS for libvirt builds

2018-09-05 Thread Jim Fehlig
Since libvirt commit 60d9ad6f GnuTLS is required to build libvirt. The
various libvirt build tests in osstest began failing after the commit
hit libvirt.git master. Adding libgnutls28-dev to the list of packages
needed to build libvirt will fix the currently broken builds.

Signed-off-by: Jim Fehlig 
---

Rebase and repost of

https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg02584.html

 Osstest/Toolstack/libvirt.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm
index 45df173..d5cda77 100644
--- a/Osstest/Toolstack/libvirt.pm
+++ b/Osstest/Toolstack/libvirt.pm
@@ -26,7 +26,7 @@ use XML::LibXML;
 
 sub new {
 my ($class, $ho, $methname,$asset) = @_;
-my @extra_packages = qw(libavahi-client3);
+my @extra_packages = qw(libavahi-client3 libgnutls28-dev);
 my $nl_lib = "libnl-3-200";
 $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/);
 push(@extra_packages, $nl_lib);
-- 
2.18.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL

2018-09-11 Thread Jim Fehlig

On 9/5/18 3:37 PM, Jim Fehlig wrote:

On 08/24/2018 02:58 AM, Wei Liu wrote:

On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:

On 08/21/2018 05:14 AM, Jan Beulich wrote:

On 21.08.18 at 03:11,  wrote:

flight 126201 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/126201/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail 
REGR. vs. 124328


Something needs to be done about this, as this continued failure is
blocking the 4.9.3 release. I did mail about this on Aug 2nd already
for flight 125710, I've got back from Wei:


This is libvirtd's error message.

The remote host can't obtain the state change log due to it is already
held by another task/thread. It could be a libvirt / libxl bug.

2018-08-01 16:12:13.433+: 3491: warning : libxlDomainObjBeginJob:151 :
Cannot start job (modify) for domain debian.guest.osstest; current job is 
(modify) owned by (24975)


I took a closer look at the logs and it appears the finish phase of
migration fails to acquire the domain job lock since it is already held by
the perform phase. In the perform phase, after the vm has been transferred
to the dst, the qemu process associated with the vm is started. For whatever
reason that takes a long time on this host:

2018-08-19 17:05:19.182+: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
arguments: ...
2018-08-19 17:05:19.188+: libxl: libxl_exec.c:398:spawn_watch_event:
domain 1 device model: spawn watch p=(null)


This is a spurious event after the watch has been set up.


...
2018-08-19 17:05:51.529+: libxl: libxl_event.c:573:watchfd_callback:
watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
event epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+: libxl: libxl_exec.c:398:spawn_watch_event:
domain 1 device model: spawn watch p=running


So it has taken 32s for QEMU to write "running" in xenstore. This,
however, is still within the timeout limit set by libxl (60s).


Right, but it is not within libvirt's job wait timeout, which is 30s.

I've sent a series to fix this and other problems I found while 
testing/debugging

https://www.redhat.com/archives/libvir-list/2018-September/msg00178.html

Assuming those patches are committed to libvirt.git master, it's not clear how 
they will improve this and other tests that use an older, fixed libvirt commit.


FYI, the patches fixing this problem from the libvirt side have been committed 
to libvir.git master now. See commits 60b4fd90, e39c66d3, 47da84e0, 0149464a, 
and 5ea2abb3.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] virsh support?

2018-09-14 Thread Jim Fehlig

On 9/14/18 8:08 AM, Dag Nygren wrote:

Hi!

Can someone inform me on XEN vtpm support in
libvirt? From which version if so?


FYI, questions regarding libvirt are better directed to libvirt-l...@redhat.com



Asking because I tried to do a "virh dumpxml" on a XEN machine
with vtpm attached and "xl list -l" lists it fine
but there is nothing in the dumpxml result??


The libxl driver in libvirt does not support vtpm, and AFAIK no one is working 
on that. Patches welcome :-).


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH v2] ts-xen-build-prep: install libgnutls28-dev for libvirt build

2018-09-24 Thread Jim Fehlig

On 9/24/18 3:49 AM, Wei Liu wrote:

d54ecf31b2 placed the build dependency in a wrong file. This patch
adds the dependency to the right file. Add a runtime dependency in
libvirt.pm.


Thanks for fixing my fix :-).

Regards,
Jim



Signed-off-by: Wei Liu 
---
Cc: Jim Fehlig 
---
  Osstest/Toolstack/libvirt.pm | 2 +-
  ts-xen-build-prep| 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm
index d5cda77e..13f92dae 100644
--- a/Osstest/Toolstack/libvirt.pm
+++ b/Osstest/Toolstack/libvirt.pm
@@ -26,7 +26,7 @@ use XML::LibXML;
  
  sub new {

  my ($class, $ho, $methname,$asset) = @_;
-my @extra_packages = qw(libavahi-client3 libgnutls28-dev);
+my @extra_packages = qw(libavahi-client3 libgnutls30);
  my $nl_lib = "libnl-3-200";
  $nl_lib = "libnl1" if ($ho->{Suite} =~ m/wheezy/);
  push(@extra_packages, $nl_lib);
diff --git a/ts-xen-build-prep b/ts-xen-build-prep
index 77a2d284..23bbbeb9 100755
--- a/ts-xen-build-prep
+++ b/ts-xen-build-prep
@@ -208,7 +208,8 @@ sub prep () {
libxml2-utils libxml2-dev
libdevmapper-dev w3c-dtd-xhtml libxml-xpath-perl
libelf-dev
-  ccache nasm checkpolicy ebtables);
+  ccache nasm checkpolicy ebtables
+  libgnutls28-dev);
  
  if ($ho->{Suite} !~ m/squeeze|wheezy/) {

push(@packages, qw(ocaml-nox ocaml-findlib));




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] libxl: set channel devid when not provided by application

2018-02-07 Thread Jim Fehlig
Applications like libvirt may not populate a device devid field,
delegating that to libxl. If needed, the application can later
retrieve the libxl-produced devid. Indeed most devices are handled
this way in libvirt, channel devices included.

This works well when only one channel device is defined, but more
than one results in

qemu-system-i386: -chardev socket,id=libxl-channel-1,\
path=/tmp/test-org.qemu.guest_agent.00,server,nowait:
Duplicate ID 'libxl-channel-1' for chardev

Besides the odd '-1' value in the id, multiple channels have the same
id, causing qemu to fail. A simple fix is to set an uninitialized
devid (-1) to the dev_num passed to libxl__init_console_from_channel().

Signed-off-by: Jim Fehlig 
---

I get the feeling that if needed devid should be set earlier, but
this seems like the most opportune spot. Suggestions for improvements
welcome.

 tools/libxl/libxl_console.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c
index 39d8430df8..8faf3a24f3 100644
--- a/tools/libxl/libxl_console.c
+++ b/tools/libxl/libxl_console.c
@@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc,
 
 /* Perform validation first, allocate second. */
 
+if (channel->devid == -1)
+channel->devid = dev_num;
+
 if (!channel->name) {
 LOG(ERROR, "channel %d has no name", channel->devid);
 return ERROR_INVAL;
-- 
2.16.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-21 Thread Jim Fehlig
On several Skylake machines I've observed xl segfaults when running
create or destroy subcommands. Other subcommands may segfault too,
but I've only looked at create and destroy which share a similar
backtrace

Thread 2 (Thread 0x77ff3700 (LWP 2941)):
at /usr/include/bits/unistd.h:44
at xs.c:398
fd=) at xs.c:1231

Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().

The backtrace smelled of memory/stack overflow, which was verified by
increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
overflow is observed on Skylake due to a broader CPU feature set which
must be saved within _dl_runtime_resolve and friends.

While PTHREAD_STACK_MIN should advertise a suitable stack size based on
the underlying system, increasing the default size makes xenstore a bit
more robust on systems with insufficient/broken minimums.

Signed-off-by: Jim Fehlig 
---
 tools/xenstore/xs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index abffd9cd80..3891e4907c 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -800,7 +800,7 @@ bool xs_watch(struct xs_handle *h, const char *path, const 
char *token)
struct iovec iov[2];
 
 #ifdef USE_PTHREAD
-#define DEFAULT_THREAD_STACKSIZE (16 * 1024)
+#define DEFAULT_THREAD_STACKSIZE (32 * 1024)
 #define READ_THREAD_STACKSIZE  \
((DEFAULT_THREAD_STACKSIZE < PTHREAD_STACK_MIN) ?   \
PTHREAD_STACK_MIN : DEFAULT_THREAD_STACKSIZE)
-- 
2.16.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: set channel devid when not provided by application

2018-02-21 Thread Jim Fehlig

Any comments on this patch? Thanks!

Regards,
Jim

On 02/07/2018 08:04 PM, Jim Fehlig wrote:

Applications like libvirt may not populate a device devid field,
delegating that to libxl. If needed, the application can later
retrieve the libxl-produced devid. Indeed most devices are handled
this way in libvirt, channel devices included.

This works well when only one channel device is defined, but more
than one results in

qemu-system-i386: -chardev socket,id=libxl-channel-1,\
path=/tmp/test-org.qemu.guest_agent.00,server,nowait:
Duplicate ID 'libxl-channel-1' for chardev

Besides the odd '-1' value in the id, multiple channels have the same
id, causing qemu to fail. A simple fix is to set an uninitialized
devid (-1) to the dev_num passed to libxl__init_console_from_channel().

Signed-off-by: Jim Fehlig 
---

I get the feeling that if needed devid should be set earlier, but
this seems like the most opportune spot. Suggestions for improvements
welcome.

  tools/libxl/libxl_console.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c
index 39d8430df8..8faf3a24f3 100644
--- a/tools/libxl/libxl_console.c
+++ b/tools/libxl/libxl_console.c
@@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc,
  
  /* Perform validation first, allocate second. */
  
+if (channel->devid == -1)

+channel->devid = dev_num;
+
  if (!channel->name) {
  LOG(ERROR, "channel %d has no name", channel->devid);
  return ERROR_INVAL;




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Jim Fehlig

On 02/21/2018 10:18 PM, Juergen Gross wrote:

On 21/02/18 23:13, Jim Fehlig wrote:

On several Skylake machines I've observed xl segfaults when running
create or destroy subcommands. Other subcommands may segfault too,
but I've only looked at create and destroy which share a similar
backtrace

Thread 2 (Thread 0x77ff3700 (LWP 2941)):
 at /usr/include/bits/unistd.h:44
 at xs.c:398
 fd=) at xs.c:1231

Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().

The backtrace smelled of memory/stack overflow, which was verified by
increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
overflow is observed on Skylake due to a broader CPU feature set which
must be saved within _dl_runtime_resolve and friends.

While PTHREAD_STACK_MIN should advertise a suitable stack size based on
the underlying system, increasing the default size makes xenstore a bit
more robust on systems with insufficient/broken minimums.


We hit something like this before:

https://lists.xen.org/archives/html/xen-devel/2016-07/msg01727.html

The main problem is that any thread local storage is taken from the
stack without any interface being available for adjusting the _real_
stack size instead of the meory for thread local storage + stack.

So we can increase the stack size of the xenstore thread and wait for
the next breakage, or we have to think about a proper solution.

Right now I have no sensible idea how to address the problem, as the
old thread suggests the underlying glibc problem isn't fixed yet (wow:
the problem is known for more than 7 years now):

https://sourceware.org/bugzilla/show_bug.cgi?id=11787


It looks like the bug I'm hitting is described in

https://sourceware.org/bugzilla/show_bug.cgi?id=22636

And unlike the other bug, it has been fixed.

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/xenstore: try to get minimum thread stack size for watch thread

2018-02-23 Thread Jim Fehlig

On 02/22/2018 06:53 AM, Juergen Gross wrote:

When creating a pthread in xs_watch() try to get the minimal needed
size of the thread from glibc instead of using a constant. This avoids
problems when the library is used in programs with large per-thread
memory.

Use dlsym() to get the pointer to __pthread_get_minstack() in order to
avoid linkage problems and fall back to the current constant size if
not found.

Signed-off-by: Juergen Gross 
---
Only compile tested. Jim, can you please verify this patch is solving
your original problem?


It didn't help, but it could be due to my buggy glibc

# gdb xl
...
(gdb) r create test-hvm.xl
Starting program: /usr/sbin/xl create test-hvm.xl
Parsing config from test-hvm.xl
Program received signal SIGSEGV, Segmentation fault.
0x772d51c2 in __pthread_get_minstack () from /lib64/libpthread.so.0
(gdb) thr a a bt

Thread 1 (Thread 0x77fd8780 (LWP 2568)):
#0  0x772d51c2 in __pthread_get_minstack () from /lib64/libpthread.so.0
#1  0x766ae259 in xs_watch (h=0x5578fc90,
path=path@entry=0x55798fa0 "/local/domain/0/device-model/2/state",
token=token@entry=0x557990b0 "3/0") at xs.c:826
#2  0x779476f4 in libxl__ev_xswatch_register 
(gc=gc@entry=0x557955f0,
w=w@entry=0x55797468, func=func@entry=0x7793dd10 
,
path=0x55798fa0 "/local/domain/0/device-model/2/state") at 
libxl_event.c:638

#3  0x7793deb0 in libxl__xswait_start (gc=gc@entry=0x557955f0,
xswa=xswa@entry=0x557973e0) at libxl_aoutils.c:53
#4  0x779326b0 in libxl__spawn_spawn (egc=egc@entry=0x7fffd950,
ss=ss@entry=0x55797370) at libxl_exec.c:292
#5  0x779258d3 in libxl__spawn_local_dm (egc=0x7fffd950, 
dmss=)

at libxl_dm.c:2400
#6  0x7791d3a7 in domcreate_launch_dm (egc=0x7fffd950, 
multidev=0x55798168,

ret=) at libxl_create.c:1379
#7  0x77967275 in libxl__bootloader_run (egc=egc@entry=0x7fffd950,
bl=bl@entry=0x55796cc0) at libxl_bootloader.c:403
#8  0x7791ffe3 in initiate_domain_create (egc=egc@entry=0x7fffd950,
dcs=dcs@entry=0x55796610) at libxl_create.c:997
#9  0x779201a1 in do_domain_create (ctx=ctx@entry=0x5578f2a0,
d_config=d_config@entry=0x7fffdb70, domid=domid@entry=0x7fffdaa8,
restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, 
params=params@entry=0x0,

ao_how=0x0, aop_console_how=0x0) at libxl_create.c:1682
#10 0x779204b6 in libxl_domain_create_new (ctx=0x5578f2a0,
d_config=d_config@entry=0x7fffdb70, domid=domid@entry=0x7fffdaa8,
ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x0) at 
libxl_create.c:1885

#11 0x555780b4 in create_domain (dom_info=dom_info@entry=0x7fffe0b0)
at xl_vmcontrol.c:902
#12 0x555790c4 in main_create (argc=1, argv=0x7fffe378) at 
xl_vmcontrol.c:1207

#13 0x55560c5b in main (argc=2, argv=0x7fffe370) at xl.c:384

If you like, I can try a patched glibc after the weekend :-).

Regards,
Jim


---
  tools/xenstore/Makefile |  4 
  tools/xenstore/xs.c | 19 ++-
  2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile
index 2b99d2bc1b..fb6c73e297 100644
--- a/tools/xenstore/Makefile
+++ b/tools/xenstore/Makefile
@@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR)
ln -sf $< $@
  
  xs.opic: CFLAGS += -DUSE_PTHREAD

+ifeq ($(CONFIG_Linux),y)
+xs.opic: CFLAGS += -DUSE_DLSYM
+xs.opic: LDFLAGS += -ldl
+endif
  
  libxenstore.so.$(MAJOR).$(MINOR): xs.opic xs_lib.opic

$(CC) $(LDFLAGS) $(PTHREAD_LDFLAGS) -Wl,$(SONAME_LDFLAG) 
-Wl,libxenstore.so.$(MAJOR) $(SHLIB_LDFLAGS) -o $@ $^ $(LDLIBS_libxentoolcore) 
$(SOCKET_LIBS) $(PTHREAD_LIBS) $(APPEND_LDFLAGS)
diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index abffd9cd80..8372f5b1a4 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -47,6 +47,11 @@ struct xs_stored_msg {
  
  #include 
  
+#ifdef USE_DLSYM

+#define __USE_GNU
+#include 
+#endif
+
  struct xs_handle {
/* Communications channel to xenstore daemon. */
int fd;
@@ -810,12 +815,24 @@ bool xs_watch(struct xs_handle *h, const char *path, 
const char *token)
if (!h->read_thr_exists) {
sigset_t set, old_set;
pthread_attr_t attr;
+   static size_t stack_size;
+#ifdef USE_DLSYM
+   size_t (*getsz)(void);
+#endif
  
+		if (!stack_size) {

+#ifdef USE_DLSYM
+   getsz = dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
+   stack_size = getsz ? getsz() : READ_THREAD_STACKSIZE;
+#else
+   stack_size = READ_THREAD_STACKSIZE;
+#endif
+   }
if (pthread_attr_init(&attr) != 0) {
mutex_unlock(&h->request_mutex);
return false;
}

Re: [Xen-devel] [PATCH v2] tools/xenstore: try to get minimum thread stack size for watch thread

2018-02-26 Thread Jim Fehlig

On 02/26/2018 01:46 AM, Juergen Gross wrote:

When creating a pthread in xs_watch() try to get the minimal needed
size of the thread from glibc instead of using a constant. This avoids
problems when the library is used in programs with large per-thread
memory.

Use dlsym() to get the pointer to __pthread_get_minstack() in order to
avoid linkage problems and fall back to the current constant size if
not found.

Signed-off-by: Juergen Gross 
---
V2:
- use _GNU_SOURCE (Wei Liu)
- call __pthread_get_minstack() with parameter
- add -ldl to correct make flags
- ensure to not using smaller stack size than today
---
  tools/xenstore/Makefile |  4 
  tools/xenstore/xs.c | 21 -
  2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile
index 2b99d2bc1b..0831be0b6f 100644
--- a/tools/xenstore/Makefile
+++ b/tools/xenstore/Makefile
@@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR)
ln -sf $< $@
  
  xs.opic: CFLAGS += -DUSE_PTHREAD

+ifeq ($(CONFIG_Linux),y)
+xs.opic: CFLAGS += -DUSE_DLSYM
+libxenstore.so.$(MAJOR).$(MINOR): LDFLAGS += -ldl
+endif


Dropping this patch in one of my automated builds caused a libxenstore link 
failure

[   99s] gcc-lsystemd -ldl -pthread -Wl,-soname -Wl,libxenstore.so.3.0 
-shared -o libxenstore.so.3.0.3 xs.opic xs_lib.opic 
/home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/libs/toolcore/libxentoolcore.so 

[   99s] 
/home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/xenstore/libxenstore.so: 
undefined reference to `dlsym'


I hacked around it by appending '-ldl' to the end of the subsequent 
libxenstore.so rule.



  libxenstore.so.$(MAJOR).$(MINOR): xs.opic xs_lib.opic
$(CC) $(LDFLAGS) $(PTHREAD_LDFLAGS) -Wl,$(SONAME_LDFLAG) 
-Wl,libxenstore.so.$(MAJOR) $(SHLIB_LDFLAGS) -o $@ $^ $(LDLIBS_libxentoolcore) 
$(SOCKET_LIBS) $(PTHREAD_LIBS) $(APPEND_LDFLAGS)
diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index abffd9cd80..77700bff2b 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -16,6 +16,8 @@
  License along with this library; If not, see 
<http://www.gnu.org/licenses/>.
  */
  
+#define _GNU_SOURCE

+
  #include 
  #include 
  #include 
@@ -47,6 +49,10 @@ struct xs_stored_msg {
  
  #include 
  
+#ifdef USE_DLSYM

+#include 
+#endif
+
  struct xs_handle {
/* Communications channel to xenstore daemon. */
int fd;
@@ -810,12 +816,25 @@ bool xs_watch(struct xs_handle *h, const char *path, 
const char *token)
if (!h->read_thr_exists) {
sigset_t set, old_set;
pthread_attr_t attr;
+   static size_t stack_size;
+#ifdef USE_DLSYM
+   size_t (*getsz)(pthread_attr_t *attr);
+#endif
  
  		if (pthread_attr_init(&attr) != 0) {

mutex_unlock(&h->request_mutex);
return false;
}
-   if (pthread_attr_setstacksize(&attr, READ_THREAD_STACKSIZE) != 
0) {
+   if (!stack_size) {
+#ifdef USE_DLSYM
+   getsz = dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
+   if (getsz)
+   stack_size = getsz(&attr);
+#endif
+   if (stack_size < READ_THREAD_STACKSIZE)
+   stack_size = READ_THREAD_STACKSIZE;
+   }
+   if (pthread_attr_setstacksize(&attr, stack_size) != 0) {
pthread_attr_destroy(&attr);
mutex_unlock(&h->request_mutex);
return false;


This worked fine, even on the system with the buggy glibc.

Tested-by: Jim Fehlig 

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH V2] libxl: set channel devid when not provided by application

2018-02-26 Thread Jim Fehlig
Applications like libvirt may not populate a device devid field,
delegating that to libxl. If needed, the application can later
retrieve the libxl-produced devid. Indeed most devices are handled
this way in libvirt, channel devices included.

This works well when only one channel device is defined, but more
than one results in

qemu-system-i386: -chardev socket,id=libxl-channel-1,\
path=/tmp/test-org.qemu.guest_agent.00,server,nowait:
Duplicate ID 'libxl-channel-1' for chardev

Besides the odd '-1' value in the id, multiple channels have the same
id, causing qemu to fail. A simple fix is to set an uninitialized
devid (-1) to the dev_num passed to libxl__init_console_from_channel().

Signed-off-by: Jim Fehlig 
---

V2:
Set console devid to channel devid as part of initializing a console
from a channel.

 tools/libxl/libxl_console.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c
index 39d8430df8..9a02a23c2a 100644
--- a/tools/libxl/libxl_console.c
+++ b/tools/libxl/libxl_console.c
@@ -401,6 +401,9 @@ int libxl__init_console_from_channel(libxl__gc *gc,
 
 /* Perform validation first, allocate second. */
 
+if (channel->devid == -1)
+channel->devid = dev_num;
+
 if (!channel->name) {
 LOG(ERROR, "channel %d has no name", channel->devid);
 return ERROR_INVAL;
@@ -446,7 +449,7 @@ int libxl__init_console_from_channel(libxl__gc *gc,
 abort();
 }
 
-console->devid = dev_num;
+console->devid = channel->devid;
 console->consback = LIBXL__CONSOLE_BACKEND_IOEMU;
 console->backend_domid = channel->backend_domid;
 console->name = libxl__strdup(NOGC, channel->name);
-- 
2.16.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] tools/xenstore: try to get minimum thread stack size for watch thread

2018-03-02 Thread Jim Fehlig

On 03/02/2018 05:40 AM, Wei Liu wrote:

On Fri, Mar 02, 2018 at 12:29:31PM +, Wei Liu wrote:

On Mon, Feb 26, 2018 at 09:53:38AM -0700, Jim Fehlig wrote:

On 02/26/2018 01:46 AM, Juergen Gross wrote:

When creating a pthread in xs_watch() try to get the minimal needed
size of the thread from glibc instead of using a constant. This avoids
problems when the library is used in programs with large per-thread
memory.

Use dlsym() to get the pointer to __pthread_get_minstack() in order to
avoid linkage problems and fall back to the current constant size if
not found.

Signed-off-by: Juergen Gross 
---
V2:
- use _GNU_SOURCE (Wei Liu)
- call __pthread_get_minstack() with parameter
- add -ldl to correct make flags
- ensure to not using smaller stack size than today
---
   tools/xenstore/Makefile |  4 
   tools/xenstore/xs.c | 21 -
   2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile
index 2b99d2bc1b..0831be0b6f 100644
--- a/tools/xenstore/Makefile
+++ b/tools/xenstore/Makefile
@@ -100,6 +100,10 @@ libxenstore.so.$(MAJOR): libxenstore.so.$(MAJOR).$(MINOR)
ln -sf $< $@
   xs.opic: CFLAGS += -DUSE_PTHREAD
+ifeq ($(CONFIG_Linux),y)
+xs.opic: CFLAGS += -DUSE_DLSYM
+libxenstore.so.$(MAJOR).$(MINOR): LDFLAGS += -ldl
+endif


Dropping this patch in one of my automated builds caused a libxenstore link 
failure

[   99s] gcc-lsystemd -ldl -pthread -Wl,-soname -Wl,libxenstore.so.3.0
-shared -o libxenstore.so.3.0.3 xs.opic xs_lib.opic 
/home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/libs/toolcore/libxentoolcore.so

[   99s] 
/home/abuild/rpmbuild/BUILD/xen-4.10.0-testing/tools/xenstore/../../tools/xenstore/libxenstore.so:
undefined reference to `dlsym'

I hacked around it by appending '-ldl' to the end of the subsequent
libxenstore.so rule.


Hmm... Maybe I'm a bit dense today. I know the position of -l matters
but I don't quite understand how placing -pthread before xs.opic works
but -ldl doesn't. xs.c uses both after all.


I'm indeed very dense -- -pthread is a special option that sets the
proper flags for linking pthread library for both the preprocessor and
linker.

But still, Juergen must have tested the change, so I wonder why it
doesn't work in your setup. What is your build environment? Gcc version?


I dropped the patch in a package build on the openSUSE build service, where gcc7 
was used. But I don't see the problem when building from sources with gcc7. 
Apparently we have a bug in our package build, so ignore this comment. Tested-by 
still stands though :-).


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [libvirt test] 118006: regressions - FAIL

2018-01-16 Thread Jim Fehlig

On 01/15/2018 07:49 AM, osstest service owner wrote:

flight 118006 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118006/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 117772
  build-i386-libvirt6 libvirt-buildfail REGR. vs. 117772
  build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 117772
  build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 117772


Should be fixed by

https://libvirt.org/git/?p=libvirt.git;a=commit;h=66aa7e02c69cd90995f29dbfaca6c659ffe11693

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

2020-03-26 Thread Jim Fehlig

On 3/25/20 1:11 AM, Jan Beulich wrote:

On 24.03.2020 19:39, Julien Grall wrote:

On 24/03/2020 16:13, Jan Beulich wrote:

On 24.03.2020 16:21, Hongyan Xia wrote:

From: Hongyan Xia 
In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes


This wants comparing against numbers demonstrating the bad effects of
the global domctl lock. Iirc they were quite a bit higher than 6 min,
perhaps depending on guest properties.


Your original commit message doesn't contain any clue in which
cases the domctl lock was an issue. So please provide information
on the setups you think it will make it worse.


I did never observe the issue myself - let's see whether one of the SUSE
people possibly involved in this back then recall (or have further
pointers; Jim, Charles?), or whether any of the (partly former) Citrix
folks do. My vague recollection is that the issue was the tool stack as
a whole stalling for far too long in particular when destroying very
large guests.


I too only have a vague memory of the issue but do recall shutting down large 
guests (e.g. 500GB) taking a long time and blocking other toolstack operations. 
I haven't checked on the behavior in quite some time though.



One important aspect not discussed in the commit message
at all is that holding the domctl lock block basically _all_ tool stack
operations (including e.g. creation of new guests), whereas the new
issue attempted to be addressed is limited to just domain cleanup.


I more vaguely recall shutting down the host taking a *long* time when dom0 had 
large amounts of memory, e.g. when it had all host memory (no dom0_mem= setting 
and autoballooning enabled).


Regards,
Jim



Re: [libvirt test] 149773: regressions - FAIL

2020-04-24 Thread Jim Fehlig

On 4/24/20 3:53 AM, osstest service owner wrote:

flight 149773 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149773/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 146182
  build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182
  build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 146182
  build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 146182


Probably best to disable these tests to avoid all the spam.

Regards,
Jim



[Xen-devel] [OSSTEST PATCH] build: fix configuration of libvirt

2020-02-12 Thread Jim Fehlig
libvirt.git commit 2621d48f00 removed the last traces of gnulib, which
also removed the '--no-git' option from autogen.sh. Unknown options are
now passed to the configure script, which quickly fails with

  configure: error: unrecognized option: `--no-git'

Remove the gnulib handling from ts-libvirt-build, including the '--no-git'
option to autogen.sh. While at it remove configure options no longer
supported by the libvirt configure script.

Signed-off-by: Jim Fehlig 
---

I have poor perl skills, but hopefully this fixes the latest build
failures of the libvirt test project, e.g.

http://logs.test-lab.xenproject.org/osstest/logs/146921/build-amd64-libvirt/6.ts-libvirt-build.log


 ts-libvirt-build | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/ts-libvirt-build b/ts-libvirt-build
index e799f003..ac5afcf2 100755
--- a/ts-libvirt-build
+++ b/ts-libvirt-build
@@ -26,8 +26,7 @@ tsreadconfig();
 selectbuildhost(\@ARGV);
 builddirsprops();
 
-our %submodmap = qw(gnulib gnulib
-keycodemapdb keycodemapdb);
+our %submodmap = qw(keycodemapdb keycodemapdb);
 our $submodules;
 
 sub libvirtd_init ();
@@ -50,12 +49,6 @@ sub config() {
 }
 die "no xen prefix" unless $xenprefix;
 
-# Uses --no-git because otherwise autogen.sh will undo
-# submodulefixup's attempts to honour
-# revision_libvirt_gnulib. This in turn requires that we specify
-# --gnulib-srcdir, but ./autogen.sh doesn't propagate
-# --gnulib-srcdir to ./bootstap so we use GNULIB_SRCDIR directly.
-my $gnulib = submodule_find($submodules, "gnulib");
 target_cmd_build($ho, 3600, $builddir, <{Path} \\
-../autogen.sh --no-git \\
- --with-libxl --without-xen --without-xenapi 
--without-selinux \\
- --without-lxc --without-vbox --without-uml \\
+../autogen.sh \\
+ --with-libxl --without-selinux \\
+ --without-lxc --without-vbox \\
  --without-qemu --without-openvz --without-vmware \\
  --sysconfdir=/etc --localstatedir=/var #/
 END
-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH V2] build: fix configuration of libvirt

2020-02-14 Thread Jim Fehlig
libvirt.git commit 2621d48f00 removed the last traces of gnulib, which
also removed the '--no-git' option from autogen.sh. Unknown options are
now passed to the configure script, which quickly fails with

  configure: error: unrecognized option: `--no-git'

Remove the gnulib handling from ts-libvirt-build, including the '--no-git'
option to autogen.sh. While at it remove configure options no longer
supported by the libvirt configure script.

Signed-off-by: Jim Fehlig 
---

The only change from V1 is adding Ian to cc.

I have poor perl skills, but hopefully this fixes the latest build
failures of the libvirt test project, e.g.

http://logs.test-lab.xenproject.org/osstest/logs/146921/build-amd64-libvirt/6.ts-libvirt-build.log

 ts-libvirt-build | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/ts-libvirt-build b/ts-libvirt-build
index e799f003..ac5afcf2 100755
--- a/ts-libvirt-build
+++ b/ts-libvirt-build
@@ -26,8 +26,7 @@ tsreadconfig();
 selectbuildhost(\@ARGV);
 builddirsprops();
 
-our %submodmap = qw(gnulib gnulib
-keycodemapdb keycodemapdb);
+our %submodmap = qw(keycodemapdb keycodemapdb);
 our $submodules;
 
 sub libvirtd_init ();
@@ -50,12 +49,6 @@ sub config() {
 }
 die "no xen prefix" unless $xenprefix;
 
-# Uses --no-git because otherwise autogen.sh will undo
-# submodulefixup's attempts to honour
-# revision_libvirt_gnulib. This in turn requires that we specify
-# --gnulib-srcdir, but ./autogen.sh doesn't propagate
-# --gnulib-srcdir to ./bootstap so we use GNULIB_SRCDIR directly.
-my $gnulib = submodule_find($submodules, "gnulib");
 target_cmd_build($ho, 3600, $builddir, <{Path} \\
-../autogen.sh --no-git \\
- --with-libxl --without-xen --without-xenapi 
--without-selinux \\
- --without-lxc --without-vbox --without-uml \\
+../autogen.sh \\
+ --with-libxl --without-selinux \\
+ --without-lxc --without-vbox \\
  --without-qemu --without-openvz --without-vmware \\
  --sysconfdir=/etc --localstatedir=/var #/
 END
-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH V2] build: fix configuration of libvirt

2020-02-17 Thread Jim Fehlig

On 2/14/20 10:47 AM, Ian Jackson wrote:

Jim Fehlig writes ("[OSSTEST PATCH V2] build: fix configuration of libvirt"):

libvirt.git commit 2621d48f00 removed the last traces of gnulib, which
also removed the '--no-git' option from autogen.sh. Unknown options are
now passed to the configure script, which quickly fails with

   configure: error: unrecognized option: `--no-git'

Remove the gnulib handling from ts-libvirt-build, including the '--no-git'
option to autogen.sh. While at it remove configure options no longer
supported by the libvirt configure script.


Harmf.  Thanks for looking into this and trying to fix this mess.

I think there is a problem with your patch, which is that 2621d48f00
is recent enough that we might want still to be able to build with
earlier versions.


Ah, good point.


Is there an easy way to tell (by looking at the tree after checkout,
maybe) whether to do the old or the new thing ?


There would be no gnulib directory in a tree checked out after commit 
2621d48f00. Another option is to check for the 'bootstrap' script in the root of 
the tree, which was removed by 2621d48f00.



Your perl code looks good to me for what it is trying to do.


I'm afraid my perl is too weak to quickly hack something up to support both pre 
and post gnulib builds :-(. I'll add this task to my list if you don't have time 
for it.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] libvirt support for scheduler credit2

2020-01-22 Thread Jim Fehlig
On 1/21/20 10:05 AM, Jürgen Groß wrote:
> On 21.01.20 17:56, Kevin Stange wrote:
>> Hi,
>>
>> I looked around a bit and wasn't able to find a good answer to this, so
>> George suggested I ask here.
> 
> Cc-ing Jim.
> 
>>
>> Since Xen 4.12, credit2 is the default scheduler, but at least as of
>> libvirt 5.1.0 virsh doesn't appear to understand credit2 and produces
>> this sort of output:

You would see the same with libvirt.git master, sorry. ATM the libvirt libxl 
driver is unaware of the credit2 scheduler. Hmm, as I recall Dario was going to 
provide a patch for libvirt :-). But he is quite busy so it will have to be 
added to my very long todo list.

Regards,
Jim

>>
>> # xl sched-credit2 -d yw6hk7mo6zy3k8
>> Name    ID Weight  Cap
>> yw6hk7mo6zy3k8   4 10    0
>> # virsh schedinfo yw6hk7mo6zy3k8
>> Scheduler  : credit2
>>
>> Compared to a host running credit:
>>
>> # xl sched-credit -d gvz2b16sq38dv9
>> Name    ID Weight  Cap
>> gvz2b16sq38dv9  14    800    0
>> # virsh schedinfo gvz2b16sq38dv9
>> Scheduler  : credit
>> weight : 800
>> cap    : 0
>>
>> Trying to change the weight does nothing, not even producing an error
>> message:
>>
>> # virsh schedinfo syuxplsmdihcwc --weight 300
>> Scheduler  : credit2
>>
>> # xl sched-credit2 -d syuxplsmdihcwc
>> Name    ID Weight  Cap
>> syuxplsmdihcwc  23    400    0
>>
>> Is there a version of libvirt where I can expect this to work, or is it
>> not supported yet?  As a workaround for now I've added sched=credit to
>> my command line, but it would be nice to gain the benefits of improved
>> scheduling at some point.
>>
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] libvirt support for scheduler credit2

2020-01-29 Thread Jim Fehlig
On 1/29/20 4:10 AM, Dario Faggioli wrote:
> On Wed, 2020-01-22 at 18:56 +0000, Jim Fehlig wrote:
>> On 1/21/20 10:05 AM, Jürgen Groß wrote:
>>> On 21.01.20 17:56, Kevin Stange wrote:
>>>>
>>>> Since Xen 4.12, credit2 is the default scheduler, but at least as
>>>> of
>>>> libvirt 5.1.0 virsh doesn't appear to understand credit2 and
>>>> produces
>>>> this sort of output:
>>
>> You would see the same with libvirt.git master, sorry. ATM the
>> libvirt libxl
>> driver is unaware of the credit2 scheduler.
>>
> Right. I Just sent the patch:
> https://www.redhat.com/archives/libvir-list/2020-January/msg01292.html

Thanks! I tweaked it a bit and committed to libvirt.git

https://libvirt.org/git/?p=libvirt.git;a=commit;h=849052ec61e18780713bec171748e859e32dfd6d

Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH 00/14] deprecations: remove many old deprecations

2021-02-25 Thread Jim Fehlig

Adding xen-devel and Ian to cc.

On 2/24/21 6:11 AM, Daniel P. Berrangé wrote:

The following features have been deprecated for well over the 2
release cycle we promise


This reminded me of a bug report we received late last year when updating to 
5.2.0. 'virsh setvcpus' suddenly stopped working for Xen HVM guests. Turns out 
libxl uses cpu-add under the covers.




   ``-usbdevice`` (since 2.10.0)
   ``-drive file=3Djson:{...{'driver':'file'}}`` (since 3.0)
   ``-vnc acl`` (since 4.0.0)
   ``-mon ...,control=3Dreadline,pretty=3Don|off`` (since 4.1)
   ``migrate_set_downtime`` and ``migrate_set_speed`` (since 2.8.0)
   ``query-named-block-nodes`` result ``encryption_key_missing`` (since 2.10.0)
   ``query-block`` result ``inserted.encryption_key_missing`` (since 2.10.0)
   ``migrate-set-cache-size`` and ``query-migrate-cache-size`` (since 2.11.0)
   ``query-named-block-nodes`` and ``query-block`` result dirty-bitmaps[i].sta=
tus (ince 4.0)
   ``query-cpus`` (since 2.12.0)
   ``query-cpus-fast`` ``arch`` output member (since 3.0.0)
   ``query-events`` (since 4.0)
   chardev client socket with ``wait`` option (since 4.0)
   ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (s=
ince 4.0.0)
   ``ide-drive`` (since 4.2)
   ``scsi-disk`` (since 4.2)

AFAICT, libvirt has ceased to use all of these too.


A quick grep of the libxl code shows it uses -usbdevice, query-cpus, and 
scsi-disk.


There are many more similarly old deprecations not (yet) tackled.


The Xen tools maintainers will need to be more vigilant of the deprecations. I 
don't follow Xen development close enough to know if this topic has already been 
discussed.


Regards,
Jim




Re: [libvirt test] 151910: regressions - FAIL

2020-07-15 Thread Jim Fehlig

On 7/15/20 9:07 AM, osstest service owner wrote:

flight 151910 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/151910/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
  build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
  build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
  build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777


I see the same configure failure has been encountered since July 11

checking for XDR... no
configure: error: You must install the libtirpc >= 0.1.10 pkg-config module to 
compile libvirt


AFAICT there have been no related changes in libvirt (which has required 
libtirpc for over two years). Has this package changed in debian, or no longer 
part of a base build config?


Regards,
Jim




[PATCH] OSSTEST: Install libtirpc-dev for libvirt builds

2020-07-23 Thread Jim Fehlig
The check for XDR support was changed in libvirt commit d7147b3797
to use libtirpc pkg-config instead of complicated AC_CHECK_LIB,
AC_COMPILE_IFELSE, et. al. logic. The libvirt OSSTEST has been
failing since this change hit libvirt.git master. Fix it by adding
libtirpc-dev to the list of 'extra_packages' installed for libvirt
builds.

Signed-off-by: Jim Fehlig 
---

I *think* this change will work for older libvirt branches too.
The old, hand-coded m4 logic should work with libtirpc-dev
installed.

 Osstest/Toolstack/libvirt.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/Toolstack/libvirt.pm b/Osstest/Toolstack/libvirt.pm
index e817f5b4..11e4d730 100644
--- a/Osstest/Toolstack/libvirt.pm
+++ b/Osstest/Toolstack/libvirt.pm
@@ -26,7 +26,7 @@ use XML::LibXML;
 
 sub new {
 my ($class, $ho, $methname,$asset) = @_;
-my @extra_packages = qw(libavahi-client3);
+my @extra_packages = qw(libavahi-client3 libtirpc-dev);
 my $nl_lib = "libnl-3-200";
 my $libgnutls = "libgnutls30";
 
-- 
2.26.2




Re: [libvirt test] 151910: regressions - FAIL

2020-07-23 Thread Jim Fehlig

On 7/15/20 1:53 PM, Jim Fehlig wrote:

On 7/15/20 9:07 AM, osstest service owner wrote:

flight 151910 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/151910/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  build-amd64-libvirt   6 libvirt-build    fail REGR. vs. 151777
  build-i386-libvirt    6 libvirt-build    fail REGR. vs. 151777
  build-arm64-libvirt   6 libvirt-build    fail REGR. vs. 151777
  build-armhf-libvirt   6 libvirt-build    fail REGR. vs. 151777


I see the same configure failure has been encountered since July 11

checking for XDR... no
configure: error: You must install the libtirpc >= 0.1.10 pkg-config module to 
compile libvirt


AFAICT there have been no related changes in libvirt (which has required 
libtirpc for over two years).


Sorry for the mistake. There has been a change in libvirt

https://gitlab.com/libvirt/libvirt/-/commit/d7147b3797380de2d159ce6324536f3e1f2d97e3

My reputation for OSSTEST patches is not the greatest, but I took a stab at it 
regardless :-)


https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01208.html

Regards,
Jim




Re: [PATCH] OSSTEST: Install libtirpc-dev for libvirt builds

2020-08-10 Thread Jim Fehlig

On 8/10/20 4:13 AM, Ian Jackson wrote:

Jim Fehlig writes ("[PATCH] OSSTEST: Install libtirpc-dev for libvirt builds"):

The check for XDR support was changed in libvirt commit d7147b3797
to use libtirpc pkg-config instead of complicated AC_CHECK_LIB,
AC_COMPILE_IFELSE, et. al. logic. The libvirt OSSTEST has been
failing since this change hit libvirt.git master. Fix it by adding
libtirpc-dev to the list of 'extra_packages' installed for libvirt
builds.

Signed-off-by: Jim Fehlig 


Reviewed-by: Ian Jackson 

Thanks!  I will push this to osstest pretest shortly.


Thanks Ian! Perhaps you've noticed libvirt has now moved to the meson build 
system. My weak perl skills have discouraged me from investigating ways to 
accommodate that.


Regards,
Jim




Re: [libvirt test] 149773: regressions - FAIL

2020-06-04 Thread Jim Fehlig

On 6/4/20 6:51 AM, Ian Jackson wrote:

Jim Fehlig writes ("Re: [libvirt test] 149773: regressions - FAIL"):

On 4/24/20 3:53 AM, osstest service owner wrote:

flight 149773 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149773/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 
146182
   build-i386-libvirt6 libvirt-buildfail REGR. vs. 
146182
   build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 
146182
   build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 
146182


Probably best to disable these tests to avoid all the spam.


I have fixed the build bug now...


I saw your patch on the libvirt dev list, thanks! I'm a bit embarrassed for not 
considering a fix on the libvirt side while trying to address this a few months 
back :-/.


I suspect the upcoming move to meson will be a bit more disruptive and will 
likely require changes to osstest.


Regards,
Jim



Re: [Discussion]: Making "LIBXL_HOTPLUG_TIMEOUT" configurable through 'xl.conf'

2023-11-24 Thread Jim Fehlig

On 11/24/23 06:04, Olaf Hering wrote:

Fri, 24 Nov 2023 13:47:53 +0100 Juergen Gross :


As Olaf has said already: this wouldn't cover actions e.g. by libvirt.


Jim pointed me to /etc/libvirt/libxl.conf. So from this perspective both
xl and libvirt is covered. Now it just takes someone to implement it.


I like Juergen's idea of libxl.conf or xen.conf for Xen. This would avoid the 
duplicate effort of adding support for such host-wide settings to the 
configuration of external libxl toolstacks like libvirt. And external stacks 
could immediately use any new settings added to the Xen configuration.


Regards,
Jim




vnuma_nodes missing pnode 0

2022-11-11 Thread Jim Fehlig

Hi All,

While fixing [1] a recent downstream libvirt build failure against 4.17 rc3, I 
noticed the json representation of libxl_vnode_info omits pnode when value is 0. 
The problem can be seen by starting a VM containing the following vnuma config


vnuma = [ [ "pnode=0", "size=2048", "vcpus=0", "vdistances=10,20" ], [ 
"pnode=1", "size=2048", "vcpus=1", "vdistances=20,10" ] ]


The json representation for this config does not contain pnode 0

   "vnuma_nodes": [
{
"memkb": 2097152,
"distances": [
10,
20
],
"vcpus": [
0
]
},
{
"memkb": 2097152,
"distances": [
20,
10
],
"pnode": 1,
"vcpus": [
1
]
}
],

I'm not familiar with the code generator for the *_to_json functions, but with a 
hint I can probably cook up a patch :-).


Regards,
Jim

[1] https://listman.redhat.com/archives/libvir-list/2022-November/235745.html



Re: vnuma_nodes missing pnode 0

2022-11-14 Thread Jim Fehlig

On 11/14/22 01:18, Jan Beulich wrote:

On 14.11.2022 07:43, Henry Wang wrote:

Sorry, missed Anthony (The toolstack maintainer). Also added him
to this thread.


Indeed there's nothing x86-ish in here, it's all about data representation.
It merely happens to be (for now) x86-specific data which is being dealt
with.

Internally I indicated to Jim that the way the code presently is generated
it looks to me as if 0 was simply taken as the default for "pnode". What I
don't know at all is whether the concept of any kind of default is actually
valid in json representation of guest configs.


0 is definitely ignored in the generated libxl_vnode_info_gen_json() function, 
which essentially has


if (p->pnode)
  format-json

I took a quick peek at the generator, but being totally unfamiliar could not 
spot a fix. I'm also not sure how such a fix could be detected for testing 
purposes by libxl users like libvirt. I.e. how to detect a libxl that emits 
`"pnode:" 0` in the json representation of libxl_domain_config object and one 
that does not.


Jim




Re: vnuma_nodes missing pnode 0

2022-11-14 Thread Jim Fehlig

On 11/14/22 10:56, Anthony PERARD wrote:

On Mon, Nov 14, 2022 at 08:53:17AM -0700, Jim Fehlig wrote:

On 11/14/22 01:18, Jan Beulich wrote:

On 14.11.2022 07:43, Henry Wang wrote:

Sorry, missed Anthony (The toolstack maintainer). Also added him
to this thread.


Indeed there's nothing x86-ish in here, it's all about data representation.
It merely happens to be (for now) x86-specific data which is being dealt
with.

Internally I indicated to Jim that the way the code presently is generated
it looks to me as if 0 was simply taken as the default for "pnode". What I
don't know at all is whether the concept of any kind of default is actually
valid in json representation of guest configs.


0 is definitely ignored in the generated libxl_vnode_info_gen_json()
function, which essentially has

if (p->pnode)
   format-json

I took a quick peek at the generator, but being totally unfamiliar could not
spot a fix. I'm also not sure how such a fix could be detected for testing
purposes by libxl users like libvirt. I.e. how to detect a libxl that emits
`"pnode:" 0` in the json representation of libxl_domain_config object and
one that does not.


Well, the missing "pnode: 0' in json isn't exactly a bug, it's been done
on purpose, see 
https://xenbits.xen.org/gitweb/?p=xen.git;h=731233d64f6a7602c1ca297f7b67ec254

When the JSON is been reloaded into it's original struct,
libxl_vnode_info, pnode will have the expected value, that is 0, because
libxl_vnode_info_init() would have reset this field to 0.

I don't think it's possible to change the generator to just have it
generate '"pnode": 0', as if we make a change, it would have to be for
all unsigned it, I think.


Which would likely cause lots of libvirt libxlxml2domconfig test failures.


Is it actually wanted to have all those in json, or is it just a case of
looking like there's missing part?


The latter. ATM, libvirt only uses the json in its unit tests. No functionality 
is affected. I'm fine with the status quo if you are :-).


Thanks,
Jim