Re: [ceph-users] OSDs are crashing during PG replication

Alexander Gubanov Thu, 10 Mar 2016 22:13:32 -0800

Sorry, I didn't have time to answer.

>1st you said, 2 osds were crashed every time. From the log you pasted,
>it makes sense to do something for osd.3.


The problem is one PG 3.2. This PG is on osd.3 and osd.16 and this osds are
both were crashed every time.

>> rm -rf
>>
/var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3

>What makes me confused now is this.
>Was osd.4 also crashed like osd.3?

I thought that the problem is osd.13 or osd.16. I tried to disable these
osds:
# ceph osd crush reweight osd.3 0
# ceph osd crush reweight osd.16 0
but when I did it 2 another osds were crashed and one of them is osd.4 and
 the pg 3.2 was on osd.4.

After this I decided to remove cache pool.
Now I'm moving all data to new big ssd and so far all all right.

On Fri, Mar 4, 2016 at 10:44 AM, Shinobu Kinjo <shinobu...@gmail.com> wrote:

> Thank you for your explanation.
>
> > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG
> replication because crashing only 2 OSDs and every time they're are the
> same.
>
> 1st you said, 2 osds were crashed every time. From the log you pasted,
> it makes sense to do something for osd.3.
>
> > rm -rf
> >
> /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3
>
> What makes me confused now is this.
> Was osd.4 also crashed like osd.3?
>
> >    -1> 2016-02-24 04:51:45.904673 7fd995026700  5 -- op tracker -- ,
> seq: 19231, time: 2016-02-24 04:51:45.904673, event: started, request:
> osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max
> 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone
> e13252) v4
>
> And crash seems to happen during this process, what I really want to
> know is what this message inferred.
> Did you check osd.13?
>
> Anyhow your cluster is now fine...no?
> That's good news.
>
> Cheers,
> Shinobu
>
> On Fri, Mar 4, 2016 at 11:05 AM, Alexander Gubanov <sht...@gmail.com>
> wrote:
> > I decided to refuse use of ssd cache pool and create just 2 pool. 1st
> pool
> > only of ssd for fast storage 2nd only of hdd for slow storage.
> > What about this file, honestly, I don't know why it is created. As I say
> I
> > flush the journal for fallen OSD and remove this file and then I start
> osd
> > damon:
> >
> > ceph-osd --flush-journal osd.3
> > rm -rf
> >
> /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3
> > service ceph start osd.3
> >
> > But if I turn the cache pool off  the file isn't created:
> >
> > ceph osd tier cache-mode ${cahec_pool} forward
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Email:
> shin...@linux.com
> GitHub:
> shinobu-x
> Blog:
> Life with Distributed Computational System based on OpenSource
>



-- 
Alexander Gubanov

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs are crashing during PG replication

Reply via email to