I'm sorry I wouldn't know, I'm on Jewel. is your cluster HEALTH_OK now? Regards,
Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Sun, May 13, 2018 at 6:29 AM Marc Roos <m.r...@f1-outsourcing.eu> wrote: > > In luminous > osd_recovery_threads = osd_disk_threads ? > osd_recovery_sleep = osd_recovery_sleep_hdd ? > > Or is this speeding up recovery, a lot different in luminous? > > [@~]# ceph daemon osd.0 config show | grep osd | grep thread > "osd_command_thread_suicide_timeout": "900", > "osd_command_thread_timeout": "600", > "osd_disk_thread_ioprio_class": "", > "osd_disk_thread_ioprio_priority": "-1", > "osd_disk_threads": "1", > "osd_op_num_threads_per_shard": "0", > "osd_op_num_threads_per_shard_hdd": "1", > "osd_op_num_threads_per_shard_ssd": "2", > "osd_op_thread_suicide_timeout": "150", > "osd_op_thread_timeout": "15", > "osd_peering_wq_threads": "2", > "osd_recovery_thread_suicide_timeout": "300", > "osd_recovery_thread_timeout": "30", > "osd_remove_thread_suicide_timeout": "36000", > "osd_remove_thread_timeout": "3600", > > -----Original Message----- > From: Webert de Souza Lima [mailto:webert.b...@gmail.com] > Sent: vrijdag 11 mei 2018 20:34 > To: ceph-users > Subject: Re: [ceph-users] Node crash, filesytem not usable > > This message seems to be very concerning: > > mds0: Metadata damage detected > > > but for the rest, the cluster seems still to be recovering. you could > try to seep thing up with ceph tell, like: > > ceph tell osd.* injectargs --osd_max_backfills=10 > > ceph tell osd.* injectargs --osd_recovery_sleep=0.0 > > ceph tell osd.* injectargs --osd_recovery_threads=2 > > > > Regards, > > Webert Lima > DevOps Engineer at MAV Tecnologia > Belo Horizonte - Brasil > IRC NICK - WebertRLZ > > > On Fri, May 11, 2018 at 3:06 PM Daniel Davidson > <dani...@igb.illinois.edu> wrote: > > > Below id the information you were asking for. I think they are > size=2, min size=1. > > Dan > > # ceph status > cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77 > > > > > health HEALTH_ERR > > > > > 140 pgs are stuck inactive for more than 300 seconds > 64 pgs backfill_wait > 76 pgs backfilling > 140 pgs degraded > 140 pgs stuck degraded > 140 pgs stuck inactive > 140 pgs stuck unclean > 140 pgs stuck undersized > 140 pgs undersized > 210 requests are blocked > 32 sec > recovery 38725029/695508092 objects degraded (5.568%) > recovery 10844554/695508092 objects misplaced (1.559%) > mds0: Metadata damage detected > mds0: Behind on trimming (71/30) > noscrub,nodeep-scrub flag(s) set > monmap e3: 4 mons at > {ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3: > 6789/0,ceph-3=172.16.31.4:6789/0} > election epoch 824, quorum 0,1,2,3 > ceph-0,ceph-1,ceph-2,ceph-3 > fsmap e144928: 1/1/1 up {0=ceph-0=up:active}, 1 up:standby > osdmap e35814: 32 osds: 30 up, 30 in; 140 remapped pgs > flags > noscrub,nodeep-scrub,sortbitwise,require_jewel_osds > pgmap v43142427: 1536 pgs, 2 pools, 762 TB data, 331 Mobjects > 1444 TB used, 1011 TB / 2455 TB avail > 38725029/695508092 objects degraded (5.568%) > 10844554/695508092 objects misplaced (1.559%) > 1396 active+clean > 76 > undersized+degraded+remapped+backfilling+peered > 64 > undersized+degraded+remapped+wait_backfill+peered > recovery io 1244 MB/s, 1612 keys/s, 705 objects/s > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 2619.54541 root default > -2 163.72159 host ceph-0 > 0 81.86079 osd.0 up 1.00000 1.00000 > 1 81.86079 osd.1 up 1.00000 1.00000 > -3 163.72159 host ceph-1 > 2 81.86079 osd.2 up 1.00000 1.00000 > 3 81.86079 osd.3 up 1.00000 1.00000 > -4 163.72159 host ceph-2 > 8 81.86079 osd.8 up 1.00000 1.00000 > 9 81.86079 osd.9 up 1.00000 1.00000 > -5 163.72159 host ceph-3 > 10 81.86079 osd.10 up 1.00000 1.00000 > 11 81.86079 osd.11 up 1.00000 1.00000 > -6 163.72159 host ceph-4 > 4 81.86079 osd.4 up 1.00000 1.00000 > 5 81.86079 osd.5 up 1.00000 1.00000 > -7 163.72159 host ceph-5 > 6 81.86079 osd.6 up 1.00000 1.00000 > 7 81.86079 osd.7 up 1.00000 1.00000 > -8 163.72159 host ceph-6 > 12 81.86079 osd.12 up 0.79999 1.00000 > 13 81.86079 osd.13 up 1.00000 1.00000 > -9 163.72159 host ceph-7 > 14 81.86079 osd.14 up 1.00000 1.00000 > 15 81.86079 osd.15 up 1.00000 1.00000 > -10 163.72159 host ceph-8 > 16 81.86079 osd.16 up 1.00000 1.00000 > 17 81.86079 osd.17 up 1.00000 1.00000 > -11 163.72159 host ceph-9 > 18 81.86079 osd.18 up 1.00000 1.00000 > 19 81.86079 osd.19 up 1.00000 1.00000 > -12 163.72159 host ceph-10 > 20 81.86079 osd.20 up 1.00000 1.00000 > 21 81.86079 osd.21 up 1.00000 1.00000 > -13 163.72159 host ceph-11 > 22 81.86079 osd.22 up 1.00000 1.00000 > 23 81.86079 osd.23 up 1.00000 1.00000 > -14 163.72159 host ceph-12 > 24 81.86079 osd.24 up 1.00000 1.00000 > 25 81.86079 osd.25 up 1.00000 1.00000 > -15 163.72159 host ceph-13 > 26 81.86079 osd.26 down 0 1.00000 > 27 81.86079 osd.27 down 0 1.00000 > -16 163.72159 host ceph-14 > 28 81.86079 osd.28 up 1.00000 1.00000 > 29 81.86079 osd.29 up 1.00000 1.00000 > -17 163.72159 host ceph-15 > 30 81.86079 osd.30 up 1.00000 1.00000 > 31 81.86079 osd.31 up 1.00000 1.00000 > > > > On 05/11/2018 11:56 AM, David Turner wrote: > > > What are some outputs of commands to show us the state of > your > cluster. Most notable is `ceph status` but `ceph osd tree` would be > helpful. What are the size of the pools in your cluster? Are they all > size=3 min_size=2? > > On Fri, May 11, 2018 at 12:05 PM Daniel Davidson > <dani...@igb.illinois.edu> wrote: > > > Hello, > > Today we had a node crash, and looking at it, it > seems > there is a > problem with the RAID controller, so it is not > coming > back up, maybe > ever. It corrupted the local filesytem for the > ceph > storage there. > > The remainder of our storage (10.2.10) cluster is > running, and it looks > to be repairing and our min_size is set to 2. > Normally, > I would expect > that the system would keep running normally from > and end > user > perspective when this happens, but the system is > down. > All mounts that > were up when this started look to be stale, and > new > mounts give the > following error: > > # mount -t ceph ceph-0:/ /test/ -o > > name=admin,secretfile=/etc/ceph/admin.secret,noatime,_netdev,rbytes > mount error 5 = Input/output error > > Any suggestions? > > Dan > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com