Am 28.06.2016 um 16:43 schrieb Peter Lieven:
Am 28.06.2016 um 14:56 schrieb Dr. David Alan Gilbert:
* Peter Lieven (p...@kamp.de) wrote:
Am 28.06.2016 um 14:29 schrieb Paolo Bonzini:
Am 28.06.2016 um 13:37 schrieb Paolo Bonzini:
On 28/06/2016 11:01, Peter Lieven wrote:
I recently found that Qemu is using several hundred megabytes of RSS
memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.
1) We changed the qemu coroutine pool to have a per thread and a global
release
pool. The choosen poolsize and the changed algorithm could lead to up
to
192 free coroutines with just a single iothread. Each of the
coroutines
in the pool each having 1MB of stack memory.
But the fix, as you correctly note, is to reduce the stack size. It
would be nice to compile block-obj-y with -Wstack-usage=2048 too.
To reveal if there are any big stack allocations in the block layer?
Yes. Most should be fixed by now, but a handful are probably still there.
(definitely one in vvfat.c).
As it seems reducing to 64kB breaks live migration in some (non reproducible)
cases.
Does it hit the guard page?
How would that look like? I get segfaults like this:
segfault at 7f91aa642b78 ip 0000555ab714ef7d sp 00007f91aa642b50 error 6 in
qemu-system-x86_64[555ab6f2c000+794000]
most of the time error 6. Sometimes error 7. segfault is near the sp.
A backtrace would be good.
Here we go. My old friend nc_senv_compat ;-)
This has already been fixed in master. My test systems use an older Qemu ;-)
Peter
Again the question: Would you go for reducing the stack size an eliminating all
stack eaters ?
The static netbuf in nc_sendv_compat is no problem.
And: I would go for adding the guard page without MAP_GROWSDOWN and mmaping the
rest of the
stack with this flag if availble. So we are save on non Linux systems or Linux
before 3.9 or merged memory regions.
Peter
---
Program received signal SIGSEGV, Segmentation fault.
0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
at net/net.c:701
(gdb) bt full
#0 0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
at net/net.c:701
buf = '\000' <repeats 65890 times>...
buffer = 0x0
offset = 0
#1 0x0000555555a2f058 in qemu_deliver_packet_iov (sender=0x5555565a46b0,
flags=0, iov=0x7ffff7e98d20, iovcnt=1, opaque=0x555557802370)
at net/net.c:745
nc = 0x555557802370
ret = 21845
#2 0x0000555555a3132d in qemu_net_queue_deliver (queue=0x555557802590,
sender=0x5555565a46b0, flags=0, data=0x55555659e2a8 "", size=74)
at net/queue.c:163
ret = -1
iov = {iov_base = 0x55555659e2a8, iov_len = 74}
#3 0x0000555555a3178b in qemu_net_queue_flush (queue=0x555557802590)
at net/queue.c:260
packet = 0x55555659e280
ret = 21845
#4 0x0000555555a2eb7a in qemu_flush_or_purge_queued_packets (
nc=0x555557802370, purge=false) at net/net.c:629
No locals.
#5 0x0000555555a2ebe4 in qemu_flush_queued_packets (nc=0x555557802370)
at net/net.c:642
No locals.
#6 0x00005555557747b7 in virtio_net_set_status (vdev=0x555556fb32a8,
status=7 '\a') at /usr/src/qemu-2.5.0/hw/net/virtio-net.c:178
ncs = 0x555557802370
queue_started = true
n = 0x555556fb32a8
__func__ = "virtio_net_set_status"
q = 0x555557308b50
i = 0
queue_status = 7 '\a'
#7 0x0000555555795501 in virtio_set_status (vdev=0x555556fb32a8, val=7 '\a')
at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:618
k = 0x55555657eb40
__func__ = "virtio_set_status"
#8 0x00005555557985e6 in virtio_vmstate_change (opaque=0x555556fb32a8,
running=1, state=RUN_STATE_RUNNING)
at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:1539
vdev = 0x555556fb32a8
qbus = 0x555556fb3240
__func__ = "virtio_vmstate_change"
k = 0x555556570420
backend_run = true
#9 0x00005555558592ae in vm_state_notify (running=1, state=RUN_STATE_RUNNING)
at vl.c:1601
e = 0x555557320cf0
next = 0x555557af4c40
#10 0x000055555585737d in vm_start () at vl.c:756
requested = RUN_STATE_MAX
#11 0x0000555555a209ec in process_incoming_migration_co (opaque=0x5555566a1600)
at migration/migration.c:392
f = 0x5555566a1600
local_err = 0x0
mis = 0x5555575ab0e0
ps = POSTCOPY_INCOMING_NONE
ret = 0
#12 0x0000555555b61efd in coroutine_trampoline (i0=1465036928, i1=21845)
at util/coroutine-ucontext.c:80
arg = {p = 0x55555752b080, i = {1465036928, 21845}}
self = 0x55555752b080
co = 0x55555752b080
#13 0x00007ffff5cb7800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#14 0x00007fffffffcb40 in ?? ()
No symbol table info available.
#15 0x0000000000000000 in ?? ()
No symbol table info available.
Dave
2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed
freeing
of memory. This lead to higher heap allocations which could not
effectively
be returned to kernel (most likely due to fragmentation).
I agree that some of the exec.c allocations need some care, but I would
prefer to use a custom free list or lazy allocation instead of mmap.
This would only help if the elements from the free list would be allocated
using mmap? The issue is that RCU delays the freeing so that the number of
concurrent allocations is high and then a bunch is freed at once. If the memory
was malloced it would still have caused trouble.
The free list should improve reuse and fragmentation. I'll take a look at
lazy allocation of subpages, too.
Ok, that would be good. And for the PhsyPageMap we use mmap and try to avoid
the realloc?
Peter
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Mit freundlichen Grüßen
Peter Lieven
...........................................................
KAMP Netzwerkdienste GmbH
Vestische Str. 89-91 | 46117 Oberhausen
Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
p...@kamp.de | http://www.kamp.de
Geschäftsführer: Heiner Lante | Michael Lante
Amtsgericht Duisburg | HRB Nr. 12154
USt-Id-Nr.: DE 120607556
...........................................................