Hi All,

I'm trying to troubleshoot a strange issue with my Ceph cluster.

We're Running Ceph Version 0.72.2
All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS 
Drives and 2 x 100GB Intel DC S3700 SSD's for Journals.
All Pools have a replica of 2 or better. I.e. metadata replica of 3.

I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on a 
single node (any node) the load average of that node shoots up to 230+ and the 
whole cluster starts blocking IO requests until it settles down and its fine 
again.

Any ideas on why the load average goes so crazy & starts to block IO?


<snips from my ceph.conf>
[osd]
        osd data = /var/ceph/osd.$id
        osd journal size = 15000
        osd mkfs type = xfs
        osd mkfs options xfs = "-i size=2048 -f"
        osd mount options xfs = 
"rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k"
        osd max backfills = 5
        osd recovery max active = 3

[osd.0]
        host = pbnerbd01
        public addr = 10.100.96.10
        cluster addr = 10.100.128.10
        osd journal = 
/dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1
        devs = /dev/sda4
</end>

Thanks,
Quenten

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to