You have been subscribed to a public bug: ---Problem Description--- https://github.com/open-power/supermicro-openpower/issues/59
SW/HW Configuration PNOR image version: 5/3/2016 BMC image version: 0.25 CPLD Version: B2.81.01 Host OS version: Ubuntu 16.04 LTS UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS HTX version: 394 Processor: 00UL865 * 2 Memory: SK hynix 16GB 2Rx4 PC4-2133P * 16 Summary of Issue Two UbuntuKVM guests are each configured with 8 processors, 64 GB of memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-through'd from the Host OS's K80). The two guests are each put into a Create/Destroy loop, with HTX running on each of the guests (NOT HOST) in between its creation and destruction. The mdt.bu profile is used, and the processors, memory, and the GPU are put under load. The HTX session lasts 9 minutes. While this is running, the amount of available memory (free memory) in the Host OS will slowly decrease, and this can continue until the point wherein there's no more free memory for the Host OS to do anything, including creating the two VM guests. It seems to be that after every cycle, a small portion of the memory that was allocated to the VM guest does not get released back to the Host OS, and eventually, this can and will add up to take up all the available memory in the Host OS. At some point, the VM guest(s) might get disconnected and will display the following error: error: Disconnected from qemu:///system due to I/O error error: One or more references were leaked after disconnect from the hypervisor Then, when the Host OS tries to start the VM guest again, the following error shows up: error: Failed to create domain from guest2_trusty.xml error: internal error: early end of file from monitor, possible problem: Unexpected error in spapr_alloc_htab() at /build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030: 2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem The Host OS syslog, as seen HERE, also contains quite some errors. To just list a few: May 13 20:27:44 191-136 kernel: [36827.151228] alloc_contig_range: [3fb800, 3fd8f8) PFNs busy May 13 20:27:44 191-136 kernel: [36827.151291] alloc_contig_range: [3fb800, 3fd8fc) PFNs busy May 13 20:27:44 191-136 libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': realloc(): invalid next size: 0x000001000a780400 *** May 13 20:27:44 191-136 libvirtd[19263]: ======= Backtrace: ========= May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x8720c)[0x3fffaf6a720c] May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x96f70)[0x3fffaf6b6f70] May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(realloc+0x16c)[0x3fffaf6b87fc] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virReallocN+0x68)[0x3fffaf90ccc8] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0x8ef6c)[0x3fff9346ef6c] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0xa826c)[0x3fff9348826c] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventPollRunOnce+0x8b4)[0x3fffaf9332b4] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventRunDefaultImpl+0x54)[0x3fffaf931334] May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virNetDaemonRun+0x1f0)[0x3fffafad2f70] May 13 20:27:44 191-136 libvirtd[19263]: /usr/sbin/libvirtd(+0x15d74)[0x52e45d74] May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x2319c)[0x3fffaf64319c] May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fffaf6433b8] May 13 20:27:44 191-136 libvirtd[19263]: ======= Memory map: ======== May 13 20:27:44 191-136 libvirtd[19263]: 52e30000-52eb0000 r-xp 00000000 08:02 65540510 /usr/sbin/libvirtd May 13 20:27:44 191-136 libvirtd[19263]: 52ec0000-52ed0000 r--p 00080000 08:02 65540510 /usr/sbin/libvirtd May 13 20:27:44 191-136 libvirtd[19263]: 52ed0000-52ee0000 rw-p 00090000 08:02 65540510 /usr/sbin/libvirtd May 13 20:27:44 191-136 libvirtd[19263]: 1000a730000-1000a830000 rw-p 00000000 00:00 0 [heap] May 13 20:27:44 191-136 libvirtd[19263]: 3fff60000000-3fff60030000 rw-p 00000000 00:00 0 May 13 20:27:44 191-136 libvirtd[19263]: 3fff60030000-3fff64000000 ---p 00000000 00:00 0 May 13 20:50:33 191-136 kernel: [38196.502926] audit: type=1400 audit(1463197833.497:4025): apparmor="DENIED" operation="open" profile="libvirt-d3ade785-c1c1-4519-b123-9d28704c2ad4" name="/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.0/0003:03:00.0/devspec" pid=24887 comm="qemu-system-ppc" requested_mask="r" denied_mask="r" fsuid=110 ouid=0 May 13 20:50:33 191-136 virtlogd[3727]: End of file while reading data: Input/output error Notes Host OS's free memory will also slowly decrease when HTX is NOT executed at all on the guests between guest Create/Destory, but at a much slower pace, and VM guests can also still fail to be created, with the same error message, and even though the Host OS might still have plenty of free memory left: error: Failed to create domain from guest2_trusty.xml error: internal error: early end of file from monitor, possible problem: Unexpected error in spapr_alloc_htab() at /build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030: 2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem However, this happened only once so far, and after it completed about 3924 Create/Destroy cycles. The other guest that was running the same test concurrently did NOT have any issues and went on to 4,600+ cycles. ---uname output--- Host OS version: Ubuntu 16.04 LTS UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS Machine Type = SMC I do not see any actual information about using all memory, here are: 1. "Failed to allocate HTAB" - happens because we run out of _contiguous_ chunks of CMA memory, not just any RAM 2. libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': realloc(): invalid next size: 0x000001000a780400 *** - this looks more like memory corruption than insufficient memory I suggest collecting statistics using something like this shell script: # !/bin/sh while [ true ] do <here you put guest start/stop> grep -e "\(CmaFree:\|MemFree:\)" /proc/meminfo | paste -d "\t" - - >> mymemorylog done and attaching the resulting mymemorylog to this bug. Also it would be interesting to know if the issue can be reproduced without loaded NVIDIA driver in the guest or even without passing NVIDIA GPU to the guest. Meanwhile I am running my tests and see if I can get this behavior. Ok, located the problem, will post a patch tomorrow to the public lists. Basically when QEMU dies, it does unpin DMA pages when its memory context is destroyed which was expected to happen when QEMU process exits but actually it may happen lot later if some kernel thread was executed on this same context and referenced it so until it was scheduled again, the very last memory context release would not happen. == Comment: #15 - Leonardo Augusto Guimaraes Garcia <lagar...@br.ibm.com> - 2016-08-24 08:15:00 == (In reply to comment #14) > On my host, I have 10 guests running. Sum of all 10 guests memory will come > up to 69GB. Ok... So, this is quite different from what is in the bug description. In the bug description, I read: "Two UbuntuKVM guests are each configured with 8 processors, 64 GB of memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-through'd from the Host OS's K80). The two guests are each put into a Create/Destroy loop, with HTX running on each of the guests (NOT HOST) in between its creation and destruction. The mdt.bu profile is used, and the processors, memory, and the GPU are put under load. The HTX session lasts 9 minutes." What is the scenario being worked on this bug? I suggest you open a new bug for your issue if needed and we continue to investigate the original issue here. > > I am trying to bring up 11th guest which is having 5Gb memory and it fails: > > root@lotkvm:~# virsh start --console lotg12 > error: Failed to start domain lotg12 > error: internal error: process exited while connecting to monitor: > 5076802818bda30000000000003f2,format=raw,if=none,id=drive-virtio-disk0 > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0, > id=virtio-disk0,bootindex=1 -drive > file=/dev/disk/by-id/wwn-0x6005076802818bda30000000000003f4,format=raw, > if=none,id=drive-virtio-disk1 -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1, > id=virtio-disk1 -netdev tap,fd=41,id=hostnet0 -device > virtio-net,netdev=hostnet0,id=net0,mac=52:54:00:9b:53:77,bus=pci.0,addr=0x1, > bootindex=2 -chardev pty,id=charserial0 -device > spapr-vty,chardev=charserial0,reg=0x30000000 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on > 2016-08-24T12:00:50.375315Z qemu-system-ppc64: Failed to allocate KVM HPT of > order 26 (try smaller maxmem?): Cannot allocate memory This is not because you don't have available memory. This is because you don't have CMA memory available. Please, take a look at LTC bug 145072 comment 5 and subsequent comments. > > > I waited for an hour and retried guest start.. It fails still.. > > Current memory on host : > ----------- > root@lotkvm:~# free -g > total used free shared buff/cache > available > Mem: 127 73 0 0 53 > 53 > Swap: 11 4 6 I think there are actually two separate problems here. (A) Pages in the CMA zone are getting pinned and causing fragmentation of the CMA zone, leading to the messages saying "qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem". This happens because the guest is doing PCI passthrough with DDW enabled and hence pins all its memory. If guest pages happen to be allocated in the CMA zone, they get pinned there and then can't be moved for a future HPT allocation. Balbir was looking at the possibility of moving the pages out of the CMA zone before pinning them, but this work was dependent on some upstream refactoring which seems to be stalled. (B) On VM destruction, the pages are not getting unpinned and freed in a timely fashion. Alexey debugged this issue and has posted two patches to fix the problem: "powerpc/iommu: Stop using @current in mm_iommu_xxx" and "powerpc/mm/iommu: Put pages on process exit". These patches touch two maintainers' areas (powerpc and vfio) and hence need two maintainers' concurrence, and thus haven't gone anywhere yet. (Of course, issue (B) exacerbates issue (A).) Upon moving host and guests to 4.8 kernel. Still almost whole memory is getting used on host. Any updates here, any patches that we can expect soon ? Please let us know.. Thanks, Manju 4.8 does not yet have the fix for the pinned page migrations. I am not sure of the status of https://patchwork.kernel.org/patch/9238861/ upstream. I checked to see if I could find it in any git tree, but could not. I suspect we need this fix in first. > Balbir - Is this fixed in the latest 4.8 kernel out today? My patch is in powerpc-next https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=2e5bbb5461f138cac631fe21b4ad956feabfba22 Should hit 4.9 and we can backport it. I am also trying to work on improvements to the patch for the future. Not sure of aik's patch status Balbir Singh. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: architecture-ppc64le bot-comment bugnameltc-143509 severity-high targetmilestone-inin16041 -- Host OS slowly runs out of available memory after running UbuntuKVM Guest Create/Destroy loop #59 https://bugs.launchpad.net/bugs/1632045 You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp