On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton <lionel-subscript...@bouton.name> wrote: > Le 11/07/2016 04:48, 한승진 a écrit : >> Hi cephers. >> >> I need your help for some issues. >> >> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs. >> >> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs). >> >> I've experienced one of OSDs was killed himself. >> >> Always it issued suicide timeout message. > > This is probably a fragmentation problem : typical rbd access patterns > cause heavy BTRFS fragmentation.
To the extent that operations take over 120 seconds to complete? Really? I have no experience with BTRFS but had heard that performance can "fall off a cliff" but I didn't know it was that bad. -- Cheers, Brad > > If you already use the autodefrag mount option, you can try this which > performs much better for us : > https://github.com/jtek/ceph-utils/blob/master/btrfs-defrag-scheduler.rb > > Note that it can take some time to fully defragment the filesystems but > it shouldn't put more stress than autodefrag while doing so. > > If you don't already use it, set : > filestore btrfs snap = false > in ceph.conf an restart your OSDs. > > Finally if you use journals on the filesystem and not on dedicated > partitions, you'll have to recreate them with the NoCow attribute > (there's no way to defragment journals in any way that doesn't kill > performance otherwise). > > Best regards, > > Lionel > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com