------- Comment From swgre...@us.ibm.com 2017-11-10 14:39 EDT-------
(In reply to comment #31)
> Hi Scott,
> the howto is mixed for Desktop users, Server users and selective upgrades.
> For your case you only need the most simple case which would be:
>
> Essentially you want to:
>
> # Check - all other updates done (to clear the view)
> $ apt list --upgradable
> Listing... Done
>
> # Enable proposed for z on Server
> $ echo "deb http://ports.ubuntu.com/ubuntu-ports/ xenial-proposed main
> restricted universe multiverse" | sudo tee
> /etc/apt/sources.list.d/enable-proposed.list
> $ sudo apt update
> $ apt list --upgradable
> [...]
> linux-headers-generic/xenial-proposed 4.4.0.100.105 s390x [upgradable from:
> 4.4.0.98.103]
> linux-headers-virtual/xenial-proposed 4.4.0.100.105 s390x [upgradable from:
> 4.4.0.98.103]
> linux-image-virtual/xenial-proposed 4.4.0.100.105 s390x [upgradable from:
> 4.4.0.98.103]
>
> # Install just the kernels from proposed
> $ sudo apt install linux-generic
>
> No need to set apt prefs if you only do a selective install.
> If you'd do a global "sudo apt upgrade" you'd get all, but that is likely
> not what you want in your case. After you have done so you can just
> enable/disable the line in /etc/apt/sources.list.d/enable-proposed.list as
> needed.
>
> Hope that helps

Yes, your instructions were immensely useful, thanks for the
explanation.

With the proposed fix applied, I am now able to start over 100 virtual
guests, even with aio-max-nr set to 64K:

root@zm93k8:~# cat /proc/sys/fs/aio-max-nr
65535

root@zm93k8:/tmp# virsh list |grep running
86    zs93kag70041                   running
87    zs93kag70042                   running
88    zs93kag70055                   running
89    zs93kag70056                   running
90    zs93kag70057                   running
91    zs93kag70058                   running
92    zs93kag70059                   running
93    zs93kag70060                   running
94    zs93kag70061                   running
95    zs93kag70062                   running
96    zs93kag70063                   running
97    zs93kag70064                   running
98    zs93kag70065                   running
99    zs93kag70066                   running
100   zs93kag70067                   running
101   zs93kag70068                   running
102   zs93kag70069                   running
103   zs93kag70070                   running
104   zs93kag70071                   running
105   zs93kag70072                   running
106   zs93kag70073                   running
107   zs93kag70074                   running
108   zs93kag70075                   running
109   zs93kag70077                   running
110   zs93kag70078                   running
111   zs93kag70079                   running
112   zs93kag70080                   running
113   zs93kag70081                   running
114   zs93kag70082                   running
115   zs93kag70083                   running
116   zs93kag70084                   running
117   zs93kag70085                   running
118   zs93kag70086                   running
119   zs93kag70087                   running
120   zs93kag70088                   running
121   zs93kag70089                   running
122   zs93kag70090                   running
123   zs93kag70091                   running
124   zs93kag70092                   running
125   zs93kag70093                   running
126   zs93kag70094                   running
127   zs93kag70095                   running
128   zs93kag70096                   running
129   zs93kag70097                   running
130   zs93kag70098                   running
131   zs93kag70099                   running
132   zs93kag70100                   running
133   zs93kag70101                   running
134   zs93kag70102                   running
135   zs93kag70103                   running
136   zs93kag70104                   running
137   zs93kag70105                   running
138   zs93kag70106                   running
139   zs93kag70107                   running
140   zs93kag70108                   running
141   zs93kag70109                   running
142   zs93kag70110                   running
143   zs93kag70111                   running
144   zs93kag70112                   running
145   zs93kag70113                   running
146   zs93kag70114                   running
147   zs93kag70115                   running
148   zs93kag70116                   running
149   zs93kag70117                   running
150   zs93kag70118                   running
151   zs93kag70119                   running
152   zs93kag70120                   running
153   zs93kag70121                   running
154   zs93kag70122                   running
155   zs93kag70123                   running
156   zs93kag70124                   running
157   zs93kag70125                   running
158   zs93kag70126                   running
159   zs93kag70127                   running
160   zs93kag70128                   running
161   zs93kag70129                   running
162   zs93kag70130                   running
163   zs93kag70131                   running
164   zs93kag70132                   running
165   zs93kag70133                   running
166   zs93kag70134                   running
167   zs93kag70135                   running
168   zs93kag70136                   running
169   zs93kag70137                   running
170   zs93kag70138                   running
172   zs93kag70024                   running
173   zs93kag70025                   running
174   zs93kag70026                   running
175   zs93kag70027                   running
176   zs93kag70038                   running
177   zs93kag70039                   running
178   zs93kag70040                   running
179   zs93kag70043                   running
180   zs93kag70044                   running
181   zs93kag70045                   running
182   zs93kag70046                   running
183   zs93kag70047                   running
184   zs93kag70048                   running
185   zs93kag70049                   running
186   zs93kag70050                   running
187   zs93kag70051                   running
188   zs93kag70052                   running
189   zs93kag70053                   running
190   zs93kag70054                   running
191   zs93kag70076                   running
root@zm93k8:/tmp#

When will this fix be available to external customers?   We will want to
recommend it to our zKVM users.  Thank you !

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1717224

Title:
  virsh start of virtual guest domain fails with internal error due to
  low default aio-max-nr sysctl value

Status in Ubuntu on IBM z Systems:
  In Progress
Status in kvm package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  In Progress
Status in procps package in Ubuntu:
  New
Status in kvm source package in Xenial:
  New
Status in linux source package in Xenial:
  In Progress
Status in procps source package in Xenial:
  New
Status in kvm source package in Zesty:
  New
Status in linux source package in Zesty:
  In Progress
Status in procps source package in Zesty:
  New
Status in kvm source package in Artful:
  Confirmed
Status in linux source package in Artful:
  In Progress
Status in procps source package in Artful:
  New

Bug description:
  Starting virtual guests via on Ubuntu 16.04.2 LTS installed with its
  KVM hypervisor on an IBM Z14 system LPAR fails on the 18th guest with
  the following error:

  root@zm93k8:/rawimages/ubu1604qcow2# virsh start zs93kag70038
  error: Failed to start domain zs93kag70038
  error: internal error: process exited while connecting to monitor: 
2017-07-26T01:48:26.352534Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70038.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device

  The previous 17 guests started fine:

  root@zm93k8# virsh start zs93kag70020
  Domain zs93kag70020 started

  root@zm93k8# virsh start zs93kag70021
  Domain zs93kag70021 started

  .
  .

  root@zm93k8:/rawimages/ubu1604qcow2# virsh start zs93kag70036
  Domain zs93kag70036 started

  
  We ended up fixing the issue by adding the following line to /etc/sysctl.conf 
: 

  fs.aio-max-nr = 4194304

  ... then, reload the sysctl config file:

  root@zm93k8:/etc# sysctl -p /etc/sysctl.conf
  fs.aio-max-nr = 4194304

  
  Now, we're able to start more guests...

  root@zm93k8:/etc# virsh start zs93kag70036
  Domain zs93kag70036 started

  
  The default value was originally set to 65535: 

  root@zm93k8:/rawimages/ubu1604qcow2# cat /proc/sys/fs/aio-max-nr
  65536

  
  Note, we chose the 4194304 value, because this is what our KVM on System Z 
hypervisor ships as its default value.  Eg.  on our zKVM system: 

  [root@zs93ka ~]# cat /proc/sys/fs/aio-max-nr
  4194304

  ubuntu@zm93k8:/etc$ lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 16.04.2 LTS
  Release:        16.04
  Codename:       xenial
  ubuntu@zm93k8:/etc$

  ubuntu@zm93k8:/etc$ dpkg -s qemu-kvm |grep Version
  Version: 1:2.5+dfsg-5ubuntu10.8

  Is something already documented for Ubuntu KVM users warning them about the 
low default value, and some guidance as to
  how to select an appropriate value?   Also, would you consider increasing the 
default aio-max-nr value to something much
  higher, to accommodate significantly more virtual guests?  

  Thanks!

  ---uname output---
  ubuntu@zm93k8:/etc$ uname -a Linux zm93k8 4.4.0-62-generic #83-Ubuntu SMP Wed 
Jan 18 14:12:54 UTC 2017 s390x s390x s390x GNU/Linux
   
  Machine Type = z14 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   See Problem Description.

  The problem was happening a week ago, so this may not reflect that
  activity.

  This file was collected on Aug 7, one week after we were hitting the
  problem.  If I need to reproduce the problem and get fresh data,
  please let me know.

  /var/log/messages doesn't exist on this system, so I provided syslog
  output instead.

  All data have been collected too late after the problem was observed
  over a week ago.  If you need me to reproduce the problem and get new
  data, please let me know.  That's not a problem.

  Also, we would have to make special arrangements for login access to
  these systems.  I'm happy to run traces and data collection for you as
  needed.  If that's not sufficient, then we'll explore log in access
  for you.

  Thanks...   - Scott G.

  
  I was able to successfully recreate the problem and captured / attached new 
debug docs. 

  Recreate procedure:

  #  Started out with no virtual guests running.

  ubuntu@zm93k8:/home/scottg$ virsh list
   Id    Name                           State
  ----------------------------------------------------

  
  # Set fs.aio-max-nr back to original Ubuntu "out of the box" value in 
/etc/sysctl.conf

  ubuntu@zm93k8:~$ tail -1 /etc/sysctl.conf
  fs.aio-max-nr = 65536

  
  ## sysctl -a shows: 

  fs.aio-max-nr = 4194304

  
  ##  Reload sysctl.

  ubuntu@zm93k8:~$ sudo sysctl -p /etc/sysctl.conf
  fs.aio-max-nr = 65536
  ubuntu@zm93k8:~$

  ubuntu@zm93k8:~$ sudo sysctl -a |grep fs.aio-max-nr
  fs.aio-max-nr = 65536

  ubuntu@zm93k8:~$  cat /proc/sys/fs/aio-max-nr
  65536


  # Attempt to start more than 17 qcow2 virtual guests on the Ubuntu
  host.  Fails on the 18th XML.

  Script used to start guests..

  
  ubuntu@zm93k8:/home/scottg$ date;./start_privs.sh
  Wed Aug 23 13:21:25 EDT 2017
  virsh start zs93kag70015
  Domain zs93kag70015 started

  Started zs93kag70015 succesfully ...

  virsh start zs93kag70020
  Domain zs93kag70020 started

  Started zs93kag70020 succesfully ...

  virsh start zs93kag70021
  Domain zs93kag70021 started

  Started zs93kag70021 succesfully ...

  virsh start zs93kag70022
  Domain zs93kag70022 started

  Started zs93kag70022 succesfully ...

  virsh start zs93kag70023
  Domain zs93kag70023 started

  Started zs93kag70023 succesfully ...

  virsh start zs93kag70024
  Domain zs93kag70024 started

  Started zs93kag70024 succesfully ...

  virsh start zs93kag70025
  Domain zs93kag70025 started

  Started zs93kag70025 succesfully ...

  virsh start zs93kag70026
  Domain zs93kag70026 started

  Started zs93kag70026 succesfully ...

  virsh start zs93kag70027
  Domain zs93kag70027 started

  Started zs93kag70027 succesfully ...

  virsh start zs93kag70028
  Domain zs93kag70028 started

  Started zs93kag70028 succesfully ...

  virsh start zs93kag70029
  Domain zs93kag70029 started

  Started zs93kag70029 succesfully ...

  virsh start zs93kag70030
  Domain zs93kag70030 started

  Started zs93kag70030 succesfully ...

  virsh start zs93kag70031
  Domain zs93kag70031 started

  Started zs93kag70031 succesfully ...

  virsh start zs93kag70032
  Domain zs93kag70032 started

  Started zs93kag70032 succesfully ...

  virsh start zs93kag70033
  Domain zs93kag70033 started

  Started zs93kag70033 succesfully ...

  virsh start zs93kag70034
  Domain zs93kag70034 started

  Started zs93kag70034 succesfully ...

  virsh start zs93kag70035
  Domain zs93kag70035 started

  Started zs93kag70035 succesfully ...

  virsh start zs93kag70036
  error: Failed to start domain zs93kag70036
  error: internal error: process exited while connecting to monitor: 
2017-08-23T17:21:47.131809Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70036.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device

  Exiting script ... start zs93kag70036 failed
  ubuntu@zm93k8:/home/scottg$

  
  # Show that there are only 17 running guests. 

  ubuntu@zm93k8:/home/scottg$ virsh list |grep run |wc -l
  17

  ubuntu@zm93k8:/home/scottg$ virsh list
   Id    Name                           State
  ----------------------------------------------------
   25    zs93kag70015                   running
   26    zs93kag70020                   running
   27    zs93kag70021                   running
   28    zs93kag70022                   running
   29    zs93kag70023                   running
   30    zs93kag70024                   running
   31    zs93kag70025                   running
   32    zs93kag70026                   running
   33    zs93kag70027                   running
   34    zs93kag70028                   running
   35    zs93kag70029                   running
   36    zs93kag70030                   running
   37    zs93kag70031                   running
   38    zs93kag70032                   running
   39    zs93kag70033                   running
   40    zs93kag70034                   running
   41    zs93kag70035                   running


  # For fun, try starting zs93kag70036  again manually.

  ubuntu@zm93k8:/home/scottg$ date;virsh start zs93kag70036
  Wed Aug 23 13:27:28 EDT 2017
  error: Failed to start domain zs93kag70036
  error: internal error: process exited while connecting to monitor: 
2017-08-23T17:27:30.031782Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70036.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device


  # Show the XML (they're all basically the same)...

  ubuntu@zm93k8:/home/scottg$ cat zs93kag70036.xml
  <domain type='kvm'>
    <name>zs93kag70036</name>
    <memory unit='MiB'>4096</memory>
    <currentMemory unit='MiB'>2048</currentMemory>
    <vcpu placement='static'>2</vcpu>
    <os>
      <type arch='s390x' machine='s390-ccw-virtio'>hvm</type>
    </os>
    <clock offset='utc'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>preserve</on_crash>
    <devices>
      <emulator>/usr/bin/qemu-kvm</emulator>
      <disk type='file' device='disk'>
        <driver name ='qemu' type='qcow2' cache='none' io='native'/>
        <source file='/guestimages/data1/zs93kag70036.qcow2'/>
        <target dev='vda' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
        <boot order='1'/>
      </disk>
      <interface type='network'>
        <source network='privnet1'/>
        <model type='virtio'/>
        <mac address='52:54:00:70:d0:36'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
      </interface>
  <!--
      <disk type='block' device='disk'>
        <driver name ='qemu' type='raw' cache='none'/>
        <source 
dev='/dev/disk/by-id/dm-uuid-mpath-36005076802810e5540000000000006e4'/>
        <target dev='vde' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0005'/>
        <readonly/>
      </disk>
  -->
      <disk type='file' device='disk'>
        <driver name ='qemu' type='raw' cache='none' io='native'/>
        <source file='/guestimages/data1/zs93kag70036.prm'/>
        <target dev='vdf' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0006'/>
      </disk>
      <disk type='file' device='cdrom'>
        <driver name='qemu' type='raw'/>
        <source file='/guestimages/data1/zs93kag70036.iso'/>
        <target dev='sda' bus='scsi'/>
        <readonly/>
        <address type='drive' controller='0' bus='0' target='0' unit='0'/>
      </disk>
      <controller type='usb' index='0' model='none'/>
      <memballoon model='none'/>
      <console type='pty'>
        <target type='sclp' port='0'/>
      </console>
    </devices>
  </domain>

  
  This condition is very easy to replicate.  However,  we may be losing this 
system in the next day or two, so please let me know ASAP if you need any more 
data.  Thank you...  

  - Scott G.

  == Comment: #11 - Viktor Mihajlovski <mihaj...@de.ibm.com> - 2017-09-14 
  In order to support many KVM guests it is advisable to raise the aio-max-nr 
as suggested in the problem description, see also 
http://kvmonz.blogspot.co.uk/p/blog-page_7.html. I would also suggest that the 
system default setting is increased.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1717224/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to