You have been subscribed to a public bug:

---Problem Description---
https://github.com/open-power/supermicro-openpower/issues/59

SW/HW Configuration

PNOR image version: 5/3/2016
BMC image version: 0.25
CPLD Version: B2.81.01
Host OS version: Ubuntu 16.04 LTS
UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS
HTX version: 394
Processor: 00UL865 * 2
Memory: SK hynix 16GB 2Rx4 PC4-2133P * 16
Summary of Issue

Two UbuntuKVM guests are each configured with 8 processors, 64 GB of
memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-through'd
from the Host OS's K80).

The two guests are each put into a Create/Destroy loop, with HTX running
on each of the guests (NOT HOST) in between its creation and
destruction. The mdt.bu profile is used, and the processors, memory, and
the GPU are put under load. The HTX session lasts 9 minutes.

While this is running, the amount of available memory (free memory) in
the Host OS will slowly decrease, and this can continue until the point
wherein there's no more free memory for the Host OS to do anything,
including creating the two VM guests. It seems to be that after every
cycle, a small portion of the memory that was allocated to the VM guest
does not get released back to the Host OS, and eventually, this can and
will add up to take up all the available memory in the Host OS.

At some point, the VM guest(s) might get disconnected and will display
the following error:

    error: Disconnected from qemu:///system due to I/O error

    error: One or more references were leaked after disconnect from the
hypervisor

Then, when the Host OS tries to start the VM guest again, the following
error shows up:

    error: Failed to create domain from guest2_trusty.xml
    error: internal error: early end of file from monitor, possible problem: 
Unexpected error in spapr_alloc_htab() at 
/build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030:
    2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of 
requested size, try with smaller maxmem

The Host OS syslog, as seen HERE, also contains quite some errors.
To just list a few:

    May 13 20:27:44 191-136 kernel: [36827.151228] alloc_contig_range: [3fb800, 
3fd8f8) PFNs busy
    May 13 20:27:44 191-136 kernel: [36827.151291] alloc_contig_range: [3fb800, 
3fd8fc) PFNs busy
    May 13 20:27:44 191-136 libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': 
realloc(): invalid next size: 0x000001000a780400 ***
    May 13 20:27:44 191-136 libvirtd[19263]: ======= Backtrace: =========
    May 13 20:27:44 191-136 libvirtd[19263]: 
/lib/powerpc64le-linux-gnu/libc.so.6(+0x8720c)[0x3fffaf6a720c]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/lib/powerpc64le-linux-gnu/libc.so.6(+0x96f70)[0x3fffaf6b6f70]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/lib/powerpc64le-linux-gnu/libc.so.6(realloc+0x16c)[0x3fffaf6b87fc]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virReallocN+0x68)[0x3fffaf90ccc8]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0x8ef6c)[0x3fff9346ef6c]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0xa826c)[0x3fff9348826c]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventPollRunOnce+0x8b4)[0x3fffaf9332b4]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventRunDefaultImpl+0x54)[0x3fffaf931334]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virNetDaemonRun+0x1f0)[0x3fffafad2f70]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/usr/sbin/libvirtd(+0x15d74)[0x52e45d74]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/lib/powerpc64le-linux-gnu/libc.so.6(+0x2319c)[0x3fffaf64319c]
    May 13 20:27:44 191-136 libvirtd[19263]: 
/lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fffaf6433b8]
    May 13 20:27:44 191-136 libvirtd[19263]: ======= Memory map: ========
    May 13 20:27:44 191-136 libvirtd[19263]: 52e30000-52eb0000 r-xp 00000000 
08:02 65540510 /usr/sbin/libvirtd
    May 13 20:27:44 191-136 libvirtd[19263]: 52ec0000-52ed0000 r--p 00080000 
08:02 65540510 /usr/sbin/libvirtd
    May 13 20:27:44 191-136 libvirtd[19263]: 52ed0000-52ee0000 rw-p 00090000 
08:02 65540510 /usr/sbin/libvirtd
    May 13 20:27:44 191-136 libvirtd[19263]: 1000a730000-1000a830000 rw-p 
00000000 00:00 0 [heap]
    May 13 20:27:44 191-136 libvirtd[19263]: 3fff60000000-3fff60030000 rw-p 
00000000 00:00 0
    May 13 20:27:44 191-136 libvirtd[19263]: 3fff60030000-3fff64000000 ---p 
00000000 00:00 0
    May 13 20:50:33 191-136 kernel: [38196.502926] audit: type=1400 
audit(1463197833.497:4025): apparmor="DENIED" operation="open" 
profile="libvirt-d3ade785-c1c1-4519-b123-9d28704c2ad4" 
name="/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.0/0003:03:00.0/devspec"
 pid=24887 comm="qemu-system-ppc" requested_mask="r" denied_mask="r" fsuid=110 
ouid=0
    May 13 20:50:33 191-136 virtlogd[3727]: End of file while reading data: 
Input/output error

Notes

Host OS's free memory will also slowly decrease when HTX is NOT executed
at all on the guests between guest Create/Destory, but at a much slower
pace, and VM guests can also still fail to be created, with the same
error message, and even though the Host OS might still have plenty of
free memory left:

    error: Failed to create domain from guest2_trusty.xml
    error: internal error: early end of file from monitor, possible problem: 
Unexpected error in spapr_alloc_htab() at 
/build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030:
    2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of 
requested size, try with smaller maxmem

However, this happened only once so far, and after it completed about 3924 
Create/Destroy cycles.
The other guest that was running the same test concurrently did NOT have any 
issues and went on to 4,600+ cycles.

 
---uname output---
Host OS version: Ubuntu 16.04 LTS UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS
 
Machine Type = SMC 
 
I do not see any actual information about using all memory, here are:

1. "Failed to allocate HTAB" - happens because we run out of
_contiguous_ chunks of CMA memory, not just any RAM

2. libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': realloc():
invalid next size: 0x000001000a780400 *** - this looks more like memory
corruption than insufficient memory

I suggest collecting statistics using something like this shell script:

# !/bin/sh

while [ true ]
do
 <here you put guest start/stop>
 grep -e "\(CmaFree:\|MemFree:\)" /proc/meminfo | paste -d "\t" - - >> 
mymemorylog
done

and attaching the resulting mymemorylog to this bug. Also it would be
interesting to know if the issue can be reproduced without loaded NVIDIA
driver in the guest or even without passing NVIDIA GPU to the guest.
Meanwhile I am running my tests and see if I can get this behavior.

Ok, located the problem, will post a patch tomorrow to the public lists.

Basically when QEMU dies, it does unpin DMA pages when its memory
context is destroyed which was expected to happen when QEMU process
exits but actually it may happen lot later if some kernel thread was
executed on this same context and referenced it so until it was
scheduled again, the very last memory context release would not happen.

== Comment: #15 - Leonardo Augusto Guimaraes Garcia <lagar...@br.ibm.com> - 
2016-08-24 08:15:00 ==
(In reply to comment #14)
> On my host, I have 10 guests running. Sum of all 10 guests memory will come
> up to 69GB.

Ok... So, this is quite different from what is in the bug description.
In the bug description, I read:

"Two UbuntuKVM guests are each configured with 8 processors, 64 GB of
memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-through'd
from the Host OS's K80).

The two guests are each put into a Create/Destroy loop, with HTX running
on each of the guests (NOT HOST) in between its creation and
destruction. The mdt.bu profile is used, and the processors, memory, and
the GPU are put under load. The HTX session lasts 9 minutes."

What is the scenario being worked on this bug? I suggest you open a new
bug for your issue if needed and we continue to investigate the original
issue here.

> 
> I am trying to bring up 11th guest which is having 5Gb memory and it fails:
> 
> root@lotkvm:~# virsh start --console lotg12
> error: Failed to start domain lotg12
> error: internal error: process exited while connecting to monitor:
> 5076802818bda30000000000003f2,format=raw,if=none,id=drive-virtio-disk0
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,
> id=virtio-disk0,bootindex=1 -drive
> file=/dev/disk/by-id/wwn-0x6005076802818bda30000000000003f4,format=raw,
> if=none,id=drive-virtio-disk1 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,
> id=virtio-disk1 -netdev tap,fd=41,id=hostnet0 -device
> virtio-net,netdev=hostnet0,id=net0,mac=52:54:00:9b:53:77,bus=pci.0,addr=0x1,
> bootindex=2 -chardev pty,id=charserial0 -device
> spapr-vty,chardev=charserial0,reg=0x30000000 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on
> 2016-08-24T12:00:50.375315Z qemu-system-ppc64: Failed to allocate KVM HPT of
> order 26 (try smaller maxmem?): Cannot allocate memory

This is not because you don't have available memory. This is because you
don't have CMA memory available. Please, take a look at LTC bug 145072
comment 5 and subsequent comments.

> 
> 
> I waited for an hour and retried guest start.. It fails still..
> 
> Current memory on host :
> -----------
> root@lotkvm:~# free -g
>               total        used        free      shared  buff/cache  
> available
> Mem:            127          73           0           0          53         
> 53
> Swap:            11           4           6

I think there are actually two separate problems here.

(A) Pages in the CMA zone are getting pinned and causing fragmentation
of the CMA zone, leading to the messages saying "qemu-system-ppc64:
Failed to allocate HTAB of requested size, try with smaller maxmem".
This happens because the guest is doing PCI passthrough with DDW enabled
and hence pins all its memory. If guest pages happen to be allocated in
the CMA zone, they get pinned there and then can't be moved for a future
HPT allocation.

Balbir was looking at the possibility of moving the pages out of the CMA
zone before pinning them, but this work was dependent on some upstream
refactoring which seems to be stalled.

(B) On VM destruction, the pages are not getting unpinned and freed in a
timely fashion. Alexey debugged this issue and has posted two patches to
fix the problem: "powerpc/iommu: Stop using @current in mm_iommu_xxx"
and "powerpc/mm/iommu: Put pages on process exit". These patches touch
two maintainers' areas (powerpc and vfio) and hence need two
maintainers' concurrence, and thus haven't gone anywhere yet.

(Of course, issue (B) exacerbates issue (A).)

Upon moving host and guests to 4.8 kernel. Still almost whole memory is
getting used on host.

Any updates here, any patches that we can expect soon ? Please let us
know..

Thanks,
Manju


4.8 does not yet have the fix for the pinned page migrations. I am not sure of 
the status of https://patchwork.kernel.org/patch/9238861/ upstream. I checked 
to see if I could find it in any git tree, but could not. I suspect we need 
this fix in first.

> Balbir - Is this fixed in the latest 4.8 kernel out today?
My patch is in powerpc-next

https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=2e5bbb5461f138cac631fe21b4ad956feabfba22

Should hit 4.9 and we can backport it. I am also trying to work on
improvements to the patch for the future. Not sure of aik's patch status

Balbir Singh.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-143509 severity-high 
targetmilestone-inin16041
-- 
Host OS slowly runs out of available memory after running UbuntuKVM Guest 
Create/Destroy loop #59
https://bugs.launchpad.net/bugs/1632045
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to