On Mar 11, 2016 3:12 PM, "Alexander Gubanov" <sht...@gmail.com> wrote: > > Sorry, I didn't have time to answer. > > >1st you said, 2 osds were crashed every time. From the log you pasted, > >it makes sense to do something for osd.3. > > The problem is one PG 3.2. This PG is on osd.3 and osd.16 and this osds are both were crashed every time. > > >> rm -rf > >> /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 > > >What makes me confused now is this. > >Was osd.4 also crashed like osd.3? > > I thought that the problem is osd.13 or osd.16. I tried to disable these osds: > # ceph osd crush reweight osd.3 0 > # ceph osd crush reweight osd.16 0 > but when I did it 2 another osds were crashed and one of them is osd.4 and the pg 3.2 was on osd.4. > > After this I decided to remove cache pool. > Now I'm moving all data to new big ssd and so far all all right. >
Thanks for letting me know. That is good to know. I hope you are playing with the Ceph again! > On Fri, Mar 4, 2016 at 10:44 AM, Shinobu Kinjo <shinobu...@gmail.com> wrote: >> >> Thank you for your explanation. >> >> > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG replication because crashing only 2 OSDs and every time they're are the same. >> >> 1st you said, 2 osds were crashed every time. From the log you pasted, >> it makes sense to do something for osd.3. >> >> > rm -rf >> > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 >> >> What makes me confused now is this. >> Was osd.4 also crashed like osd.3? >> >> > -1> 2016-02-24 04:51:45.904673 7fd995026700 5 -- op tracker -- , seq: 19231, time: 2016-02-24 04:51:45.904673, event: started, request: osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4 >> >> And crash seems to happen during this process, what I really want to >> know is what this message inferred. >> Did you check osd.13? >> >> Anyhow your cluster is now fine...no? >> That's good news. >> >> Cheers, >> Shinobu >> >> On Fri, Mar 4, 2016 at 11:05 AM, Alexander Gubanov <sht...@gmail.com> wrote: >> > I decided to refuse use of ssd cache pool and create just 2 pool. 1st pool >> > only of ssd for fast storage 2nd only of hdd for slow storage. >> > What about this file, honestly, I don't know why it is created. As I say I >> > flush the journal for fallen OSD and remove this file and then I start osd >> > damon: >> > >> > ceph-osd --flush-journal osd.3 >> > rm -rf >> > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 >> > service ceph start osd.3 >> > >> > But if I turn the cache pool off the file isn't created: >> > >> > ceph osd tier cache-mode ${cahec_pool} forward >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Email: >> shin...@linux.com >> GitHub: >> shinobu-x >> Blog: >> Life with Distributed Computational System based on OpenSource > > > > > -- > Alexander Gubanov > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com