Re: [ceph-users] Fwd: Ceph OSD suicide himself

Brad Hubbard Mon, 11 Jul 2016 02:57:07 -0700

On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton
<lionel-subscript...@bouton.name> wrote:
> Le 11/07/2016 04:48, 한승진 a écrit :
>> Hi cephers.
>>
>> I need your help for some issues.
>>
>> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
>>
>> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
>>
>> I've experienced one of OSDs was killed himself.
>>
>> Always it issued suicide timeout message.
>
> This is probably a fragmentation problem : typical rbd access patterns
> cause heavy BTRFS fragmentation.


To the extent that operations take over 120 seconds to complete? Really?

I have no experience with BTRFS but had heard that performance can "fall
off a cliff" but I didn't know it was that bad.

-- 
Cheers,
Brad

>
> If you already use the autodefrag mount option, you can try this which
> performs much better for us :
> https://github.com/jtek/ceph-utils/blob/master/btrfs-defrag-scheduler.rb
>
> Note that it can take some time to fully defragment the filesystems but
> it shouldn't put more stress than autodefrag while doing so.
>
> If you don't already use it, set :
> filestore btrfs snap = false
> in ceph.conf an restart your OSDs.
>
> Finally if you use journals on the filesystem and not on dedicated
> partitions, you'll have to recreate them with the NoCow attribute
> (there's no way to defragment journals in any way that doesn't kill
> performance otherwise).
>
> Best regards,
>
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: Ceph OSD suicide himself

Reply via email to