Re: [ceph-users] libvirt rbd issue

Jan Schermer Thu, 03 Sep 2015 06:02:38 -0700

/etc/libvirt/qemu.conf:
max_files=XXXX

I expect this should always work, even on systemd b0rked systems...
Only solves the problem for QEMU, not for other librbd users.


Jan

> On 03 Sep 2015, at 14:48, Vasiliy Angapov <anga...@gmail.com> wrote:
> 
> And what to do for those with systemd? Because systemd totally ignores 
> limits.conf and manages limits on per-service basis... 
> What actual services should be tuned WRT LimitNOFILE?
> Or should the DefaultLimitNOFILE increased in /etc/systemd/system.conf? 
> 
> Thanks in advance!
> 
> 2015-09-03 17:46 GMT+08:00 Jan Schermer <j...@schermer.cz 
> <mailto:j...@schermer.cz>>:
> You're like the 5th person here (including me) that was hit by this.
> 
> Could I get some input from someone using CEPH with RBD and thousands of 
> OSDs? How high did you have to go?
> 
> I only have ~200 OSDs and I had to bump the limit up to 10000 for VMs that 
> have multiple volumes attached, this doesn't seem right? I understand this is 
> the effect of striping a volume accross multiple PGs, but shouldn't this be 
> more limited or somehow garbage collected?
> 
> And to get deeper - I suppose there will be one connection from QEMU to OSD 
> for each NCQ queue? Or how does this work? blk-mq will likely be different 
> again... Or is it decoupled from the virtio side of things by RBD cache if 
> that's enabled? 
> 
> Anyway, out of the box, at least on OpenStack installations
> 1) anyone having more than a few OSDs should really bump this up by default.
> 2) librbd should handle this situation gracefully by recycling connections, 
> instead of hanging
> 3) at least we should get a warning somewhere (in the libvirt/qemu log) - I 
> don't think there's anything when the issue hits
> 
> Should I make tickets for this?
> 
> Jan
> 
>> On 03 Sep 2015, at 02:57, Rafael Lopez <rafael.lo...@monash.edu 
>> <mailto:rafael.lo...@monash.edu>> wrote:
>> 
>> Hi Jan,
>> 
>> Thanks for the advice, hit the nail on the head.
>> 
>> I checked the limits and watched the no. of fd's and as it reached the soft 
>> limit (1024) thats when the transfer came to a grinding halt and the vm 
>> started locking up.
>> 
>> After your reply I also did some more googling and found another old thread:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html
>>  
>> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html>
>> 
>> I increased the max_files in qemu.conf and restarted libvirtd and the VM (as 
>> per Dan's solution in thread above), and now it seems to be happy copying 
>> any size files to the rbd. Confirmed the fd count is going past the previous 
>> soft limit of 1024 also.
>> 
>> Thanks again!!
>> Raf
>> 
>> On 2 September 2015 at 18:44, Jan Schermer <j...@schermer.cz 
>> <mailto:j...@schermer.cz>> wrote:
>> 1) Take a look at the number of file descriptors the QEMU process is using, 
>> I think you are over the limits
>> 
>> pid=pid of qemu process
>> 
>> cat /proc/$pid/limits
>> echo /proc/$pid/fd/* | wc -w
>> 
>> 2) Jumbo frames may be the cause, are they enabled on the rest of the 
>> network? In any case, get rid of NetworkManager ASAP and set it manually, 
>> though it looks like your NIC might not support them.
>> 
>> Jan
>> 
>> 
>> 
>> > On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lo...@monash.edu 
>> > <mailto:rafael.lo...@monash.edu>> wrote:
>> >
>> > Hi ceph-users,
>> >
>> > Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest 
>> > (host machine also rhel7.1) with root disk presented from ceph 0.94.2-0 
>> > (rbd) using libvirt.
>> >
>> > The VM also has a second rbd for storage presented from the same ceph 
>> > cluster, also using libvirt.
>> >
>> > The VM boots fine, no apparent issues with the OS root rbd. I am able to 
>> > mount the storage disk in the VM, and create a file system. I can even 
>> > transfer small files to it. But when I try to transfer a moderate size 
>> > files, eg. greater than 1GB, it seems to slow to a grinding halt and 
>> > eventually it locks up the whole system, and generates the kernel messages 
>> > below.
>> >
>> > I have googled some *similar* issues around, but haven't come across some 
>> > solid advice/fix. So far I have tried modifying the libvirt disk cache 
>> > settings, tried using the latest mainline kernel (4.2+), different file 
>> > systems (ext4, xfs, zfs) all produce similar results. I suspect it may be 
>> > network related, as when I was using the mainline kernel I was 
>> > transferring some files to the storage disk and this message came up, and 
>> > the transfer seemed to stop at the same time:
>> >
>> > Sep  1 15:31:22 nas1-rds NetworkManager[724]: <error> [1441085482.078646] 
>> > [platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set 
>> > '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) Invalid argument
>> >
>> > I think maybe the key info to troubleshooting is that it seems to be OK 
>> > for files under 1GB.
>> >
>> > Any ideas would be appreciated.
>> >
>> > Cheers,
>> > Raf
>> >
>> >
>> > Sep  1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for 
>> > more than 120 seconds.
>> > Sep  1 16:04:15 nas1-rds kernel: "echo 0 > 
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Sep  1 16:04:15 nas1-rds kernel: kworker/u8:1    D ffff88023fd93680     0  
>> >   60      2 0x00000000
>> > Sep  1 16:04:15 nas1-rds kernel: Workqueue: writeback bdi_writeback_workfn 
>> > (flush-252:80)
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046 
>> > ffff8802313c4440 ffff880230c13fd8
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8 
>> > ffff8802313c4440 ffff88023fd93f48
>> > Sep  1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08 
>> > ffffe8ffffd80ec0 ffff88022e827590
>> > Sep  1 16:04:15 nas1-rds kernel: Call Trace:
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] 
>> > io_schedule+0x9d/0x130
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ? 
>> > wake_up_bit+0x30/0x30
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>] 
>> > blk_mq_get_tag+0xbf/0xf0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>] 
>> > __blk_mq_alloc_request+0x1b/0x1f0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>] 
>> > blk_mq_map_request+0x181/0x1e0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>] 
>> > blk_sq_make_request+0x9a/0x380
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ? 
>> > generic_make_request_checks+0x24f/0x380
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>] 
>> > generic_make_request+0xe2/0x130
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] submit_bio+0x71/0x150
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>] 
>> > ext4_io_submit+0x25/0x50 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>] 
>> > ext4_bio_write_page+0x159/0x2e0 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>] 
>> > mpage_submit_page+0x5d/0x80 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>] 
>> > mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>] 
>> > ext4_writepages+0x733/0xd60 [ext4]
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>] 
>> > do_writepages+0x1e/0x40
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>] 
>> > __writeback_single_inode+0x40/0x220
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>] 
>> > writeback_sb_inodes+0x25e/0x420
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>] 
>> > __writeback_inodes_wb+0x9f/0xd0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>] 
>> > wb_writeback+0x263/0x2f0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>] 
>> > bdi_writeback_workfn+0x1cc/0x460
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>] 
>> > process_one_work+0x17b/0x470
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>] 
>> > worker_thread+0x11b/0x400
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ? 
>> > rescuer_thread+0x400/0x400
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? 
>> > kthread_create_on_node+0x140/0x140
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>] 
>> > ret_from_fork+0x7c/0xb0
>> > Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? 
>> > kthread_create_on_node+0x140/0x140
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> 
>> 
>> 
>> -- 
>> Rafael Lopez
>> Data Storage Administrator
>> Servers & Storage (eSolutions)
>> +61 3 990 59118
>> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] libvirt rbd issue

Reply via email to