Re: [ceph-users] OSD went down but no idea why

2018-01-30 Thread blackpiglet J.
I found some logs says osd.91 is down. I think that should be same for osd.9 I am not sure what will cause the OSD process treated by peers as down. 2018-01-30 06:39:33.767747 7f2402ef9700 1 mon.ubuntuser8@0(leader).log v424396 check_sub sending message to client.164108 10.1.248.8:0/3388257888 wi

Re: [ceph-users] OSD went down but no idea why

2018-01-30 Thread blackpiglet J.
I really not sure why monitor mark the OSD to down state "Monitor daemon marked osd.9 down, but it is still running" 2018-01-30 16:07 GMT+08:00 blackpiglet J. : > Guys, > > We had set up a five nodes Ceph cluster. Four are OSD servers and the > other one is MON and MGR. > Recently, during RGW s

[ceph-users] OSD went down but no idea why

2018-01-30 Thread blackpiglet J.
Guys, We had set up a five nodes Ceph cluster. Four are OSD servers and the other one is MON and MGR. Recently, during RGW stability test, RGW default pool: default.rgw.buckets.data is accidentally written full. As a result, RGW is stuck. We don't know the exact steps to recover, then we deleted a