@Nick: if you can recreate the librbd memory growth, any chance you can help test a potential fix [1]?
[1] https://github.com/ceph/ceph/pull/24297 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1701449 Title: high memory usage when using rbd with client caching Status in QEMU: New Bug description: Hi, we are experiencing a quite high memory usage of a single qemu (used with KVM) process when using RBD with client caching as a disk backend. We are testing with 3GB memory qemu virtual machines and 128MB RBD client cache. When running 'fio' in the virtual machine you can see that after some time the machine uses a lot more memory (RSS) on the hypervisor than she should. We have seen values (in real production machines, no artificially fio tests) of 250% memory overhead. I reproduced this with qemu version 2.9 as well. Here the contents of our ceph.conf on the hypervisor: """ [client] rbd cache writethrough until flush = False rbd cache max dirty = 100663296 rbd cache size = 134217728 rbd cache target dirty = 50331648 """ How to reproduce: * create a virtual machine with a RBD backed disk (100GB or so) * install a linux distribution on it (we are using Ubuntu) * install fio (apt-get install fio) * run fio multiple times with (e.g.) the following test file: """ # This job file tries to mimic the Intel IOMeter File Server Access Pattern [global] description=Emulation of Intel IOmeter File Server Access Pattern randrepeat=0 filename=/root/test.dat # IOMeter defines the server loads as the following: # iodepth=1 Linear # iodepth=4 Very Light # iodepth=8 Light # iodepth=64 Moderate # iodepth=256 Heavy iodepth=8 size=80g direct=0 ioengine=libaio [iometer] stonewall bs=4M rw=randrw [iometer_just_write] stonewall bs=4M rw=write [iometer_just_read] stonewall bs=4M rw=read """ You can measure the virtual machine RSS usage on the hypervisor with: virsh dommemstat <machine name> | grep rss or if you are not using libvirt: grep RSS /proc/<PID of qemu process>/status When switching off the RBD client cache, all is ok again, as the process does not use so much memory anymore. There is already a ticket on the ceph bug tracker for this ([1]). However I can reproduce that memory behaviour only when using qemu (maybe it is using librbd in a special way?). Running directly 'fio' with the rbd engine does not result in that high memory usage. [1] http://tracker.ceph.com/issues/20054 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1701449/+subscriptions