We don't have thousands but these RBDs are in a pool backed by ~600ish. I can see the fd count is up well past 10k, closer to 15k when I use a decent number of RBDs (eg. 16 or 32) and seems to increase more the bigger the file I write. Procs are almost 30k when writing a 50GB file across that number of OSDs.
the change in qemu.conf worked for me, using rhel7.1 with systemd. On 3 September 2015 at 19:46, Jan Schermer <j...@schermer.cz> wrote: > You're like the 5th person here (including me) that was hit by this. > > Could I get some input from someone using CEPH with RBD and thousands of > OSDs? How high did you have to go? > > I only have ~200 OSDs and I had to bump the limit up to 10000 for VMs that > have multiple volumes attached, this doesn't seem right? I understand this > is the effect of striping a volume accross multiple PGs, but shouldn't this > be more limited or somehow garbage collected? > > And to get deeper - I suppose there will be one connection from QEMU to > OSD for each NCQ queue? Or how does this work? blk-mq will likely be > different again... Or is it decoupled from the virtio side of things by RBD > cache if that's enabled? > > Anyway, out of the box, at least on OpenStack installations > 1) anyone having more than a few OSDs should really bump this up by > default. > 2) librbd should handle this situation gracefully by recycling > connections, instead of hanging > 3) at least we should get a warning somewhere (in the libvirt/qemu log) - > I don't think there's anything when the issue hits > > Should I make tickets for this? > > Jan > > On 03 Sep 2015, at 02:57, Rafael Lopez <rafael.lo...@monash.edu> wrote: > > Hi Jan, > > Thanks for the advice, hit the nail on the head. > > I checked the limits and watched the no. of fd's and as it reached the > soft limit (1024) thats when the transfer came to a grinding halt and the > vm started locking up. > > After your reply I also did some more googling and found another old > thread: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html > > I increased the max_files in qemu.conf and restarted libvirtd and the VM > (as per Dan's solution in thread above), and now it seems to be happy > copying any size files to the rbd. Confirmed the fd count is going past the > previous soft limit of 1024 also. > > Thanks again!! > Raf > > On 2 September 2015 at 18:44, Jan Schermer <j...@schermer.cz> wrote: > >> 1) Take a look at the number of file descriptors the QEMU process is >> using, I think you are over the limits >> >> pid=pid of qemu process >> >> cat /proc/$pid/limits >> echo /proc/$pid/fd/* | wc -w >> >> 2) Jumbo frames may be the cause, are they enabled on the rest of the >> network? In any case, get rid of NetworkManager ASAP and set it manually, >> though it looks like your NIC might not support them. >> >> Jan >> >> >> >> > On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lo...@monash.edu> wrote: >> > >> > Hi ceph-users, >> > >> > Hoping to get some help with a tricky problem. I have a rhel7.1 VM >> guest (host machine also rhel7.1) with root disk presented from ceph >> 0.94.2-0 (rbd) using libvirt. >> > >> > The VM also has a second rbd for storage presented from the same ceph >> cluster, also using libvirt. >> > >> > The VM boots fine, no apparent issues with the OS root rbd. I am able >> to mount the storage disk in the VM, and create a file system. I can even >> transfer small files to it. But when I try to transfer a moderate size >> files, eg. greater than 1GB, it seems to slow to a grinding halt and >> eventually it locks up the whole system, and generates the kernel messages >> below. >> > >> > I have googled some *similar* issues around, but haven't come across >> some solid advice/fix. So far I have tried modifying the libvirt disk cache >> settings, tried using the latest mainline kernel (4.2+), different file >> systems (ext4, xfs, zfs) all produce similar results. I suspect it may be >> network related, as when I was using the mainline kernel I was transferring >> some files to the storage disk and this message came up, and the transfer >> seemed to stop at the same time: >> > >> > Sep 1 15:31:22 nas1-rds NetworkManager[724]: <error> >> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set(): >> sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) >> Invalid argument >> > >> > I think maybe the key info to troubleshooting is that it seems to be OK >> for files under 1GB. >> > >> > Any ideas would be appreciated. >> > >> > Cheers, >> > Raf >> > >> > >> > Sep 1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for >> more than 120 seconds. >> > Sep 1 16:04:15 nas1-rds kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > Sep 1 16:04:15 nas1-rds kernel: kworker/u8:1 D ffff88023fd93680 >> 0 60 2 0x00000000 >> > Sep 1 16:04:15 nas1-rds kernel: Workqueue: writeback >> bdi_writeback_workfn (flush-252:80) >> > Sep 1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046 >> ffff8802313c4440 ffff880230c13fd8 >> > Sep 1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8 >> ffff8802313c4440 ffff88023fd93f48 >> > Sep 1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08 >> ffffe8ffffd80ec0 ffff88022e827590 >> > Sep 1 16:04:15 nas1-rds kernel: Call Trace: >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] >> io_schedule+0x9d/0x130 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ? >> wake_up_bit+0x30/0x30 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>] >> blk_mq_get_tag+0xbf/0xf0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>] >> __blk_mq_alloc_request+0x1b/0x1f0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>] >> blk_mq_map_request+0x181/0x1e0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>] >> blk_sq_make_request+0x9a/0x380 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ? >> generic_make_request_checks+0x24f/0x380 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>] >> generic_make_request+0xe2/0x130 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] >> submit_bio+0x71/0x150 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>] >> ext4_io_submit+0x25/0x50 [ext4] >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>] >> ext4_bio_write_page+0x159/0x2e0 [ext4] >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>] >> mpage_submit_page+0x5d/0x80 [ext4] >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>] >> mpage_map_and_submit_buffers+0x172/0x2a0 [ext4] >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>] >> ext4_writepages+0x733/0xd60 [ext4] >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>] >> do_writepages+0x1e/0x40 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>] >> __writeback_single_inode+0x40/0x220 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>] >> writeback_sb_inodes+0x25e/0x420 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>] >> __writeback_inodes_wb+0x9f/0xd0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>] >> wb_writeback+0x263/0x2f0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>] >> bdi_writeback_workfn+0x1cc/0x460 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>] >> process_one_work+0x17b/0x470 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>] >> worker_thread+0x11b/0x400 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ? >> rescuer_thread+0x400/0x400 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? >> kthread_create_on_node+0x140/0x140 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>] >> ret_from_fork+0x7c/0xb0 >> > Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? >> kthread_create_on_node+0x140/0x140 >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Rafael Lopez > Data Storage Administrator > Servers & Storage (eSolutions) > +61 3 990 59118 > > > -- Rafael Lopez Data Storage Administrator Servers & Storage (eSolutions) +61 3 990 59118
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com