Sorry, I didn't have time to answer. >1st you said, 2 osds were crashed every time. From the log you pasted, >it makes sense to do something for osd.3.
The problem is one PG 3.2. This PG is on osd.3 and osd.16 and this osds are both were crashed every time. >> rm -rf >> /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 >What makes me confused now is this. >Was osd.4 also crashed like osd.3? I thought that the problem is osd.13 or osd.16. I tried to disable these osds: # ceph osd crush reweight osd.3 0 # ceph osd crush reweight osd.16 0 but when I did it 2 another osds were crashed and one of them is osd.4 and the pg 3.2 was on osd.4. After this I decided to remove cache pool. Now I'm moving all data to new big ssd and so far all all right. On Fri, Mar 4, 2016 at 10:44 AM, Shinobu Kinjo <shinobu...@gmail.com> wrote: > Thank you for your explanation. > > > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG > replication because crashing only 2 OSDs and every time they're are the > same. > > 1st you said, 2 osds were crashed every time. From the log you pasted, > it makes sense to do something for osd.3. > > > rm -rf > > > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 > > What makes me confused now is this. > Was osd.4 also crashed like osd.3? > > > -1> 2016-02-24 04:51:45.904673 7fd995026700 5 -- op tracker -- , > seq: 19231, time: 2016-02-24 04:51:45.904673, event: started, request: > osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max > 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone > e13252) v4 > > And crash seems to happen during this process, what I really want to > know is what this message inferred. > Did you check osd.13? > > Anyhow your cluster is now fine...no? > That's good news. > > Cheers, > Shinobu > > On Fri, Mar 4, 2016 at 11:05 AM, Alexander Gubanov <sht...@gmail.com> > wrote: > > I decided to refuse use of ssd cache pool and create just 2 pool. 1st > pool > > only of ssd for fast storage 2nd only of hdd for slow storage. > > What about this file, honestly, I don't know why it is created. As I say > I > > flush the journal for fallen OSD and remove this file and then I start > osd > > damon: > > > > ceph-osd --flush-journal osd.3 > > rm -rf > > > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 > > service ceph start osd.3 > > > > But if I turn the cache pool off the file isn't created: > > > > ceph osd tier cache-mode ${cahec_pool} forward > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Email: > shin...@linux.com > GitHub: > shinobu-x > Blog: > Life with Distributed Computational System based on OpenSource > -- Alexander Gubanov
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com