That kind of behavior is usually caused by the OSDs getting busy enough
that they aren't answering heartbeats in a timely fashion. It can also
happen if you have any netowrk flakiness and heartbeats are getting lost
because of that.

I think (I'm not positive though) that increasing your heartbeat interval
may help. Also, looking at the number of threads you have for your OSDs,
that seems potentially problematic. If you've got 24 OSDs per machine and
each one is running 12 threads, that's 288 threads on 12 cores for just the
requests. Plus the disk threads, plus the filestore op threads... That
level of thread contention seems like it might be contributing to missing
the heartbeats. But again, that's conjecture. I've not worked with a setup
as dense as yours.

QH

On Fri, Aug 7, 2015 at 11:21 AM, Tuomas Juntunen <
tuomas.juntu...@databasement.fi> wrote:

> Hi
>
>
>
> We are experiencing an annoying problem where scrubs make OSD’s flap down
> and cause Ceph cluster to be unusable for couple of minutes.
>
>
>
> Our cluster consists of three nodes connected with 40gbit infiniband using
> IPoIB, with 2x 6 core X5670 CPU’s and 64GB of memory
>
> Each node has 6 SSD’s for journals to 12 OSD’s 2TB disks (Fast pools) and
> another 12 OSD’s 4TB disks (Archive pools) which have journal on the same
> disk.
>
>
>
> It seems that our cluster is constantly doing scrubbing, we rarely see
> only active+clean, below is the status at the moment.
>
>
>
>     cluster a2974742-3805-4cd3-bc79-765f2bddaefe
>
>      health HEALTH_OK
>
>      monmap e16: 4 mons at {lb1=
> 10.20.60.1:6789/0,lb2=10.20.60.2:6789/0,nc1=10.20.50.2:6789/0,nc2=10.20.50.3:6789/0
> }
>
>             election epoch 1838, quorum 0,1,2,3 nc1,nc2,lb1,lb2
>
>      mdsmap e7901: 1/1/1 up {0=lb1=up:active}, 4 up:standby
>
>      osdmap e104824: 72 osds: 72 up, 72 in
>
>       pgmap v12941402: 5248 pgs, 9 pools, 19644 GB data, 4810 kobjects
>
>             59067 GB used, 138 TB / 196 TB avail
>
>                 5241 active+clean
>
>                    7 active+clean+scrubbing
>
>
>
> When OSD’s go down, first the load on a node goes high during scrubbing
> and after that some OSD’s go down and 30 secs, they are back up. They are
> not really going down, but are marked as down. Then it takes around couple
> of minutes for everything be OK again.
>
>
>
> Any suggestion how to fix this? We can’t go to production while this
> behavior exists.
>
>
>
> Our config is below:
>
>
>
> [global]
>
> fsid = a2974742-3805-4cd3-bc79-765f2bddaefe
>
> mon_initial_members = lb1,lb2,nc1,nc2
>
> mon_host = 10.20.60.1,10.20.60.2,10.20.50.2,10.20.50.3
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> filestore_xattr_use_omap = true
>
>
>
> osd pool default pg num = 128
>
> osd pool default pgp num = 128
>
>
>
> public network = 10.20.0.0/16
>
>
>
>         osd_op_threads = 12
>
>         osd_op_num_threads_per_shard = 2
>
>         osd_op_num_shards = 6
>
>         #osd_op_num_sharded_pool_threads = 25
>
>         filestore_op_threads = 12
>
>         ms_nocrc = true
>
>         filestore_fd_cache_size = 64
>
>         filestore_fd_cache_shards = 32
>
>         ms_dispatch_throttle_bytes = 0
>
>         throttler_perf_counter = false
>
>
>
> mon osd min down reporters = 25
>
>
>
> [osd]
>
> osd scrub max interval = 1209600
>
> osd scrub min interval = 604800
>
> osd scrub load threshold = 3.0
>
> osd max backfills = 1
>
> osd recovery max active = 1
>
> # IO Scheduler settings
>
> osd scrub sleep = 1.0
>
> osd disk thread ioprio class = idle
>
> osd disk thread ioprio priority = 7
>
> osd scrub chunk max = 1
>
> osd scrub chunk min = 1
>
> osd deep scrub stride = 1048576
>
> filestore queue max ops = 10000
>
> filestore max sync interval = 30
>
> filestore min sync interval = 29
>
>
>
> osd deep scrub interval = 2592000
>
>         osd heartbeat grace = 240
>
>         osd heartbeat interval = 12
>
>         osd mon report interval max = 120
>
>         osd mon report interval min = 5
>
>
>
>        osd_client_message_size_cap = 0
>
>         osd_client_message_cap = 0
>
>         osd_enable_op_tracker = false
>
>
>
>         osd crush update on start = false
>
>
>
> [client]
>
>         rbd cache = true
>
>         rbd cache size = 67108864 # 64mb
>
>         rbd cache max dirty = 50331648 # 48mb
>
>         rbd cache target dirty = 33554432 # 32mb
>
>         rbd cache writethrough until flush = true # It's by default
>
>         rbd cache max dirty age = 2
>
>         admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
>
>
>
>
>
> Br,
>
> Tuomas
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to