Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage

Peter Lieven Tue, 28 Jun 2016 08:07:20 -0700

Am 28.06.2016 um 16:43 schrieb Peter Lieven:

Am 28.06.2016 um 14:56 schrieb Dr. David Alan Gilbert:

* Peter Lieven (p...@kamp.de) wrote:

Am 28.06.2016 um 14:29 schrieb Paolo Bonzini:

Am 28.06.2016 um 13:37 schrieb Paolo Bonzini:

On 28/06/2016 11:01, Peter Lieven wrote:

I recently found that Qemu is using several hundred megabytes of RSS
memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.


1) We changed the qemu coroutine pool to have a per thread and a global
release
      pool. The choosen poolsize and the changed algorithm could lead to up
      to
      192 free coroutines with just a single iothread. Each of the
      coroutines
      in the pool each having 1MB of stack memory.

But the fix, as you correctly note, is to reduce the stack size.  It
would be nice to compile block-obj-y with -Wstack-usage=2048 too.

To reveal if there are any big stack allocations in the block layer?

Yes.  Most should be fixed by now, but a handful are probably still there.
(definitely one in vvfat.c).

As it seems reducing to 64kB breaks live migration in some (non reproducible) 
cases.

Does it hit the guard page?

How would that look like? I get segfaults like this:

segfault at 7f91aa642b78 ip 0000555ab714ef7d sp 00007f91aa642b50 error 6 in 
qemu-system-x86_64[555ab6f2c000+794000]

most of the time error 6. Sometimes error 7. segfault is near the sp.

A backtrace would be good.


Here we go. My old friend nc_senv_compat ;-)


This has already been fixed in master. My test systems use an older Qemu ;-)

Peter


Again the question: Would you go for reducing the stack size an eliminating all 
stack eaters ?

The static netbuf in nc_sendv_compat is no problem.

And: I would go for adding the guard page without MAP_GROWSDOWN and mmaping the 
rest of the
stack with this flag if availble. So we are save on non Linux systems or Linux 
before 3.9 or merged memory regions.

Peter

---

Program received signal SIGSEGV, Segmentation fault.
0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
    at net/net.c:701
(gdb) bt full
#0  0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
    at net/net.c:701
        buf = '\000' <repeats 65890 times>...
        buffer = 0x0
        offset = 0
#1  0x0000555555a2f058 in qemu_deliver_packet_iov (sender=0x5555565a46b0,
    flags=0, iov=0x7ffff7e98d20, iovcnt=1, opaque=0x555557802370)
    at net/net.c:745
        nc = 0x555557802370
        ret = 21845
#2  0x0000555555a3132d in qemu_net_queue_deliver (queue=0x555557802590,
    sender=0x5555565a46b0, flags=0, data=0x55555659e2a8 "", size=74)
    at net/queue.c:163
        ret = -1
        iov = {iov_base = 0x55555659e2a8, iov_len = 74}
#3  0x0000555555a3178b in qemu_net_queue_flush (queue=0x555557802590)
    at net/queue.c:260
        packet = 0x55555659e280
        ret = 21845
#4  0x0000555555a2eb7a in qemu_flush_or_purge_queued_packets (
    nc=0x555557802370, purge=false) at net/net.c:629
No locals.
#5  0x0000555555a2ebe4 in qemu_flush_queued_packets (nc=0x555557802370)
    at net/net.c:642
No locals.
#6  0x00005555557747b7 in virtio_net_set_status (vdev=0x555556fb32a8,
    status=7 '\a') at /usr/src/qemu-2.5.0/hw/net/virtio-net.c:178
        ncs = 0x555557802370
        queue_started = true
        n = 0x555556fb32a8
        __func__ = "virtio_net_set_status"
        q = 0x555557308b50
        i = 0
        queue_status = 7 '\a'
#7  0x0000555555795501 in virtio_set_status (vdev=0x555556fb32a8, val=7 '\a')
    at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:618
        k = 0x55555657eb40
        __func__ = "virtio_set_status"
#8  0x00005555557985e6 in virtio_vmstate_change (opaque=0x555556fb32a8,
    running=1, state=RUN_STATE_RUNNING)
    at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:1539
        vdev = 0x555556fb32a8
        qbus = 0x555556fb3240
        __func__ = "virtio_vmstate_change"
        k = 0x555556570420
        backend_run = true
#9  0x00005555558592ae in vm_state_notify (running=1, state=RUN_STATE_RUNNING)
    at vl.c:1601
        e = 0x555557320cf0
        next = 0x555557af4c40
#10 0x000055555585737d in vm_start () at vl.c:756
        requested = RUN_STATE_MAX
#11 0x0000555555a209ec in process_incoming_migration_co (opaque=0x5555566a1600)
    at migration/migration.c:392
        f = 0x5555566a1600
        local_err = 0x0
        mis = 0x5555575ab0e0
        ps = POSTCOPY_INCOMING_NONE
        ret = 0
#12 0x0000555555b61efd in coroutine_trampoline (i0=1465036928, i1=21845)
    at util/coroutine-ucontext.c:80
        arg = {p = 0x55555752b080, i = {1465036928, 21845}}
        self = 0x55555752b080
        co = 0x55555752b080
#13 0x00007ffff5cb7800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#14 0x00007fffffffcb40 in ?? ()
No symbol table info available.
#15 0x0000000000000000 in ?? ()
No symbol table info available.


Dave

2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed
freeing
      of memory. This lead to higher heap allocations which could not
      effectively
      be returned to kernel (most likely due to fragmentation).

I agree that some of the exec.c allocations need some care, but I would
prefer to use a custom free list or lazy allocation instead of mmap.

This would only help if the elements from the free list would be allocated
using mmap? The issue is that RCU delays the freeing so that the number of
concurrent allocations is high and then a bunch is freed at once. If the memory
was malloced it would still have caused trouble.

The free list should improve reuse and fragmentation.  I'll take a look at
lazy allocation of subpages, too.

Ok, that would be good. And for the PhsyPageMap we use mmap and try to avoid
the realloc?

Peter

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



--

Mit freundlichen Grüßen

Peter Lieven

...........................................................

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...........................................................

Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage

Reply via email to